Dr.Jiw: Explainable AI (XAI)

วันอาทิตย์ที่ 7 มิถุนายน พ.ศ. 2569

Explainable AI (XAI)

That is the core philosophy of Explainable AI (XAI).

In traditional machine learning, we often trade interpretability for accuracy. High-performing models like deep neural networks, ensemble trees (XGBoost, Random Forest), and Large Language Models operate as black boxes. They take an input and produce an output, but the internal mathematical routing—millions or billions of parameters interacting simultaneously—is too complex for a human to track or understand.

XAI bridges this gap by introducing tools and methodologies to peer inside or approximate that logic, effectively trying to turn those opaque systems into white boxes (or at least "gray boxes").

Here is a breakdown of how XAI attempts this transformation and where the boundaries lie.

1. Two Main Approaches to "Whiteness"

XAI generally tackles the black box problem in one of two ways:

A. Intrinsic Interpretability (Designing White Boxes From the Start)

Instead of building a massive black box and trying to explain it later, this approach uses models that are inherently transparent. Humans can directly look at the model's structure and understand its logic.

Linear/Logistic Regression: You can see the exact weight ( $w_i$ ) assigned to each feature.
Decision Trees (Shallow): You can follow the exact if-then-else paths.
Generalized Additive Models (GAMs): They capture non-linear relationships but keep the impact of each variable isolated and readable.

B. Post-Hoc Explanation (Explaining the Black Box)

When you must use a complex model (like a deep neural network for computer vision) for its high accuracy, post-hoc XAI methods are applied after training to reverse-engineer or approximate how it made a decision.

Model-Agnostic Methods: Tools like LIME (Local Interpretable Model-agnostic Explanations) and SHAP (SHapley Additive exPlanations). They perturb the input data and observe how the output changes to calculate feature importance for a specific prediction.
Pixel Attribution / Saliency Maps: For image models, methods like Grad-CAM highlight the exact pixels or regions the neural network focused on when classifying an image (e.g., highlighting the ears and whiskers to classify a "cat").

2. The Core Challenge: The Trade-off

While XAI tries to make models white-box, a fundamental tension exists: The Fidelity vs. Interpretability Trade-off.

\text{High Complexity} \implies \text{High Accuracy} \iff \text{Low Interpretability}

\text{High Complexity} \implies \text{High Accuracy} \iff \text{Low Interpretability}

If a post-hoc explanation is too simple, it fails to capture the true, complex logic of the black box (low fidelity). If the explanation is too complex, it ceases to be understandable to humans, defeating the purpose of being a white box.

Because of this, post-hoc XAI rarely turns a black box into a perfect white box; instead, it usually provides a highly accurate local map or a simplified global summary.

3. Why This Transformation Matters

Shifting from black box to white box isn't just an academic exercise. It is a critical requirement in high-stakes domains:

Trust and Safety: In healthcare, a doctor needs to know why an AI diagnosed a patient with a specific condition before prescribing a high-risk treatment.
Debugging and Optimization: If a model fails or exhibits bias, a white-box view allows data scientists to pinpoint exactly which features or neurons caused the failure.
Regulatory Compliance: Frameworks like the EU's GDPR guarantee citizens a "right to explanation" when automated decisions impact them (e.g., loan denials or employment screening).