For support vector machine (SVM) learning, least squares SVM (LSSVM), derived by duality LSSVM (D-LSSVM), is a widely used model, because it has an explicit solution. One obvious limitation of the model is that the solution lacks sparseness, which limits it from training large-scale problems efficiently. In this paper, we derive an equivalent LSSVM model in primal space LSSVM (P-LSSVM) by the representer theorem and prove that P-LSSVM can be solved exactly at some sparse solutions for problems with low-rank kernel matrices. Two algorithms are proposed for finding the sparse (approximate) solution of P-LSSVM by Cholesky factorization. One is based on the decomposition of the kernel matrix K as PPT with the best low-rank matrix P approximately by pivoting Cholesky factorization.
The other is based on solving P-LSSVM by approximating the Cholesky factorization of the Hessian matrix with rank-one update scheme. For linear learning problems, theoretical analysis and experimental results support that P-LSSVM can give the sparsest solutions in all SVM learners. Experimental results on some large-scale nonlinear training problems show that our algorithms, based on P-LSSVM, can converge to acceptable test accuracies at very sparse solutions with a sparsity level <1%, and even as little as 0.01%. Hence, our algorithms are a better choice for large-scale training problems.