Research

ckf

ckf

The column-wise k-fold (ckf) algorithm is a cross-validation algorithm for Principal Component Analysis (PCA). It is mainly useful for determining the best number of PCs from a predictive perspective. The ckf is a variant of the element-wise k-fold (ekf) algorithm, probably the state-of-the-art in PCA CV. In ckf, the observation-wise k-fold operation is removed from the CV loop, which makes the algorithm much faster while retaining most of the defended good features of ekf. As a result, ckf is especially suited for very large data sets where the large number of observations renders ekf unfeasible either because of the large computational time required or because of allocation memory problems.

Apart from computational efficiency, we have recently shown that while ckf tends to identify the same optimum number of PCs than ekf, ckf provides simpler curves, much easier to interpret. In particular, ckf seems to be more suitable for the automatic selection of PCs, because the curves do not present several local minimums.

The Figure shows a comparison of the percentage of cumulative captured variance in PCA and the ckf curve, where 2 PCs are identified for the model.