Snapshot of the kind of data collected by Netflix. Only 100,480,507 ratings (1.2% entries of the 480K-by-18K matrix) are observed
Netflix challenge: impute the unobserved ratings for personalized recommendation. http://en.wikipedia.org/wiki/Netflix_Prize
Matrix completion problem. Observe a very sparse matrix \(\mathbf{Y} = (y_{ij})\). Want to impute all the missing entries. It is possible only when the matrix is structured, e.g., of low rank.
Example: Load the 128×128 Lena picture with missing pixels.
We fill out the missing pixels uisng a matrix completion technique developed by Candes and Tao \[
\text{minimize } \|\mathbf{X}\|_*
\]\[
\text{subject to } x_{ij} = y_{ij} \text{ for all observed entries } (i, j).
\] Here \(\|\mathbf{M}\|_* = \sum_i \sigma_i(\mathbf{M})\) is the nuclear norm. In words we seek the matrix with minimal nuclear norm that agrees with the observed entries. This is a semidefinite programming (SDP) problem readily solved by modern convex optimization software.
We use the convex optimizaion package COSMO.jl to solve for this semi-definite program.
# # Use COSMO solverusingConvex, COSMOsolver = COSMO.Optimizer()# Linear indices of obs. entriesobsidx =findall(Y[:] .≠0.0)# Create optimization variablesX =Variable(size(Y))# Set up optmization problemproblem =minimize(nuclearnorm(X))problem.constraints += X[obsidx] == Y[obsidx]# Solve the problem by calling solve@timesolve!(problem, solver) # fast
------------------------------------------------------------------
COSMO v0.8.8 - A Quadratic Objective Conic Solver
Michael Garstka
University of Oxford, 2017 - 2022
------------------------------------------------------------------
Problem: x ∈ R^{49153},
constraints: A ∈ R^{73665x49153} (73793 nnz),
matrix size to factor: 122818x122818,
Floating-point precision: Float64
Sets: ZeroSet of dim: 40769
DensePsdConeTriangle of dim: 32896 (256x256)
Settings: ϵ_abs = 1.0e-05, ϵ_rel = 1.0e-05,
ϵ_prim_inf = 1.0e-04, ϵ_dual_inf = 1.0e-04,
ρ = 0.1, σ = 1e-06, α = 1.6,
max_iter = 5000,
scaling iter = 10 (on),
check termination every 25 iter,
check infeasibility every 40 iter,
KKT system solver: QDLDL
Acc: Anderson Type2{QRDecomp},
Memory size = 15, RestartedMemory,
Safeguarded: true, tol: 2.0
Setup Time: 35.32ms
Iter: Objective: Primal Res: Dual Res: Rho:
1 -1.4426e+03 1.1678e+01 5.9856e-01 1.0000e-01
25 1.4495e+02 5.5360e-02 1.1033e-03 1.0000e-01
50 1.4754e+02 1.2369e-02 1.6744e-03 7.4179e-01
75 1.4797e+02 4.9490e-04 5.5696e-05 7.4179e-01
100 1.4797e+02 1.4115e-05 1.3438e-06 7.4179e-01
------------------------------------------------------------------
>>> Results
Status: Solved
Iterations: 100
Optimal objective: 148
Runtime: 0.857s (856.75ms)
1.242270 seconds (1.73 M allocations: 675.980 MiB, 5.35% gc time, 26.32% compilation time: 97% of which was recompilation)
colorview(Gray, X.value)
2 Compressed sensing
Compressed sensingCandes and Tao (2006) and Donoho (2006) tries to address a fundamental question: how to compress and transmit a complex signal (e.g., musical clips, mega-pixel images), which can be decoded to recover the original signal?
Suppose a signal \(\mathbf{x} \in \mathbb{R}^n\) is sparse with \(s\) non-zeros. We under-sample the signal by multiplying a (flat) measurement matrix \(\mathbf{y} = \mathbf{A} \mathbf{x}\), where \(\mathbf{A} \in \mathbb{R}^{m\times n}\) has iid normal entries. Candes, Romberg and Tao (2006) show that the solution to \[
\begin{eqnarray*}
&\text{minimize}& \|\mathbf{x}\|_1 \\
&\text{subject to}& \mathbf{A} \mathbf{x} = \mathbf{y}
\end{eqnarray*}
\] exactly recovers the true signal under certain conditions on \(\mathbf{A}\) when \(n \gg s\) and \(m \approx s \ln(n/s)\). Why sparsity is a reasonable assumption? Virtually all real-world images have low information content.
Generate a sparse signal and sub-sampling:
usingCairoMakie, Makie, Random# random seedRandom.seed!(123)# Size of signaln =1024# Sparsity (# nonzeros) in the signals =10# Number of samples (undersample by a factor of 8) m =128# Generate and display the signalx0 =zeros(n)x0[rand(1:n, s)] =randn(s)# Generate the random sampling matrixA =randn(m, n) / m# Subsample by multiplexingy = A * x0# plot the true signalf =Figure()Makie.Axis( f[1, 1], title ="True Signal x0", xlabel ="x", ylabel ="y")lines!(1:n, x0)f
Solve the linear programming problem.
# Use COSMO solversolver = COSMO.Optimizer()# MOI.set(solver, MOI.RawOptimizerAttribute("max_iter"), 5000)# Set up optimizaiton problemx =Variable(n)problem =minimize(norm(x, 1))problem.constraints += A * x == y# Solve the problem@timesolve!(problem, solver)
# Display the solutionf =Figure()Makie.Axis( f[1, 1], title ="Reconstructed signal overlayed with x0", xlabel ="x", ylabel ="y")scatter!(1:n, x0, label ="truth")lines!(1:n, vec(x.value), label ="recovery")axislegend(position =:lt)f
3 Automatic differentiation (Auto-Diff)
Last week we scratched the surface of matrix/vector calculus and chain rules. Recent surge of machine learning sparked rapid advancement of automatic differentiation, which applies chain rule to the computer code to obtain exact gradients (up to machine precision).
3.1 Example: multivariate normal
Define log-likelihood function.
usingDistributions, DistributionsAD, LinearAlgebra, PDMats, Random, ZygoteRandom.seed!(216)n, p =100, 3Y =randn(p, n) # each column of Y is a sample
Surprise! Gradients of the covariance matrix do not match. Close inspection reveals that Julia calculates the gradient with respect to the upper triangular part of the covariance matrix \(\text{vech}(\Omega)\)