วันพฤหัสบดีที่ 15 สิงหาคม พ.ศ. 2567

Akaike Information Critera (AIC)

Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies 1) the goodness of fit, and 2) the simplicity/parsimony, of the model into a single statistic. The lower the AIC, the better the model.

Cf. https://coolstatsblog.com/2013/08/14/using-aic-to-test-arima-models-2/

AIC=2k2ln(L)

  • k is the number of parameters in the model.
  • ln(L) is the natural logarithm of the maximum likelihood estimate for the model, i.e. best fitting model parameters.
The term 2k is a penalty for model complexity. 





















AIC is not generally used for Multi-Layer Perceptrons (MLPs).

The reasons are similar to why it isn't used for LLMs:

Complexity: MLPs are a type of neural network with a large number of weights and biases, even for a relatively small network. These parameters are not easily interpretable, and the 2k penalty term in the AIC formula would become so large that it would make the score meaningless for comparison.

Different Optimization Philosophy: MLPs are optimized through backpropagation to minimize a loss function (like Mean Squared Error or cross-entropy) on a training dataset. They are not typically fit using a maximum likelihood approach that can be easily translated into a likelihood score (L).

Alternative Metrics: The performance of MLPs and other neural networks is evaluated using metrics that are more appropriate for their task, such as accuracy, precision, recall, F1-score, or Mean Squared Error on a separate validation set.