Sparse covariance estimation for Gaussian variables

% Joëlle Skaf - 04/24/08
% (a figure is generated)
%
% Suppose y \in\reals^n is a Gaussian random variable with zero mean and
% covariance matrix R = \Expect(yy^T), with sparse inverse S = R^{-1}
% (S_ij = 0 means that y_i and y_j are conditionally independent).
% We want to estimate the covariance matrix R based on N independent
% samples y1,...,yN drawn from the distribution, and using prior knowledge
% that S is sparse
% A good heuristic for estimating R is to solve the problem
%           maximize    logdet(S) - tr(SY) - lambda*sum(sum(abs(S)))
%           subject to  S >= 0
% where Y is the sample covariance of y1,...,yN, and lambda is a sparsity
% parameter to be chosen or tuned.
% A figure showing the sparsity (number of nonzeros) of S versus lambda
% is generated.

% Input data
randn('state',0);
n = 10;
N = 100;
Strue = sprandsym(n,0.5,0.01,1);
nnz_true = sum(Strue(:)>1e-4);
R = inv(full(Strue));
y_sample = sqrtm(R)*randn(n,N);
Y = cov(y_sample');
Nlambda = 20;
lambda = logspace(-2, 3, Nlambda);
nnz = zeros(1,Nlambda);


for i=1:Nlambda
    disp(['i = ' num2str(i) ', lambda(i) = ' num2str(lambda(i))]);
    % Maximum likelihood estimate of R^{-1}
    cvx_begin sdp
        cvx_quiet false
        variable S(n,n) symmetric
        maximize log_det(S) - trace(S*Y) - lambda(i)*sum(sum(abs(S)))
        S >= 0
    cvx_end
    nnz(i) = sum(S(:)>1e-4);
end

figure;
semilogx(lambda, nnz);
hold on;
semilogx(lambda, nnz_true*ones(1,Nlambda),'r');
xlabel('\lambda');
legend('nonzeros in S', 'nonzeros in R^{-1}');
i = 1, lambda(i) = 0.01
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  7.007e-01  Solved
1.221e-04  4.471e-04  Solved
1.221e-04  4.409e-06  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -31.3012
i = 2, lambda(i) = 0.01833
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  7.030e-01  Solved
1.221e-04  4.408e-04  Solved
1.221e-04  2.636e-06  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  4.854e-11  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -31.3504
i = 3, lambda(i) = 0.033598
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  7.071e-01  Solved
1.221e-04  4.294e-04  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  3.572e-08  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -31.4368
i = 4, lambda(i) = 0.061585
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  7.142e-01  Solved
1.221e-04  4.035e-04  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  5.274e-08  Solved
1.490e-08  8.038e-11  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -31.5845
i = 5, lambda(i) = 0.11288
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  7.260e-01  Solved
1.221e-04  4.017e-04  Solved
1.221e-04  2.108e-06  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  1.654e-09  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -31.8272
i = 6, lambda(i) = 0.20691
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  7.450e-01  Solved
1.221e-04  3.628e-04  Solved
1.221e-04  3.567e-07  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  3.221e-09  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -32.2113
i = 7, lambda(i) = 0.37927
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  7.746e-01  Solved
1.221e-04  3.072e-04  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  3.307e-08  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -32.8011
i = 8, lambda(i) = 0.69519
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  8.191e-01  Solved
1.221e-04  2.413e-04  Solved
1.221e-04  3.665e-07  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  8.590e-10  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -33.6657
i = 9, lambda(i) = 1.2743
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  8.837e-01  Solved
1.221e-04  1.496e-04  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  1.380e-08  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -34.876
i = 10, lambda(i) = 2.3357
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  9.749e-01  Solved
1.221e-04  4.715e-05  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  6.232e-09  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -36.4981
i = 11, lambda(i) = 4.2813
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  1.098e+00  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  1.464e-10  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -38.5618
i = 12, lambda(i) = 7.8476
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  1.262e+00  Solved
1.221e-04  8.950e-05  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  3.844e-09  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -41.092
i = 13, lambda(i) = 14.3845
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  1.474e+00  Solved
1.221e-04  5.403e-04  Solved
1.221e-04  4.158e-06  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -44.1061
i = 14, lambda(i) = 26.3665
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  1.755e+00  Solved
1.221e-04  1.832e-03  Solved
1.221e-04  2.661e-05  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  1.327e-08  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -47.7294
i = 15, lambda(i) = 48.3293
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  2.123e+00  Solved
1.221e-04  4.810e-03  Solved
1.221e-04  8.608e-05  Solved
1.221e-04  7.479e-07  Solved
1.221e-04  5.790e-07  Solved
1.221e-04S 5.898e-07  Solved
1.221e-04  0.000e+00  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -51.9812
i = 16, lambda(i) = 88.5867
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  2.591e+00  Solved
1.221e-04  1.118e-02  Solved
1.221e-04  2.137e-04  Solved
1.221e-04  2.297e-06  Solved
1.221e-04S 4.418e-06  Solved
1.221e-04  1.321e-06  Solved
1.221e-04S 1.321e-06  Solved
1.490e-08  6.281e-10  Solved
1.490e-08S 6.336e-10  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -56.7913
i = 17, lambda(i) = 162.3777
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  3.170e+00  Solved
1.221e-04  2.286e-02  Solved
1.221e-04  4.716e-04  Solved
1.221e-04  2.627e-05  Solved
1.221e-04S 3.282e-05  Solved
1.221e-04S 1.228e-04  Solved
1.490e-08  1.000e-08  Solved
1.490e-08S 1.006e-08  Solved
1.490e-08  6.810e-09  Solved
1.490e-08S 6.212e-09  Solved
-----------------------------------------------------------------
Status: Inaccurate/Solved
Optimal value (cvx_optval): -62.0108
i = 18, lambda(i) = 297.6351
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  3.865e+00  Solved
1.221e-04  4.331e-02  Solved
1.221e-04  9.697e-04  Solved
1.221e-04  1.744e-04  Solved
1.221e-04S 1.883e-04  Solved
1.221e-04S 2.066e-04  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -67.6118
i = 19, lambda(i) = 545.5595
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  4.681e+00  Solved
1.221e-04  2.661e-02  Solved
1.221e-04  5.282e-04  Solved
1.221e-04  1.033e-04  Solved
1.221e-04S 1.112e-04  Solved
1.221e-04S 1.091e-04  Solved
1.490e-08  2.328e-09  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -73.3839
i = 20, lambda(i) = 1000
 
Successive approximation method to be employed.
   SDPT3 will be called several times to refine the solution.
   Original size: 501 variables, 222 equality constraints
   For improved efficiency, SDPT3 is solving the dual problem.
   Approximation size: 510 variables, 227 equality constraints
-----------------------------------------------------------------
 Target     Conic    Solver
Precision   Error    Status
---------------------------
1.221e-04  5.623e+00  Solved
1.221e-04  2.838e-04  Solved
1.221e-04  4.058e-05  Solved
1.221e-04S 4.460e-05  Solved
1.221e-04  1.182e-05  Solved
1.221e-04S 1.182e-05  Solved
1.490e-08  6.540e-09  Solved
1.490e-08S 6.531e-09  Solved
1.490e-08  0.000e+00  Solved
-----------------------------------------------------------------
Status: Solved
Optimal value (cvx_optval): -79.2801