Table of Contents¶

1 Introduction

1.1 Subject of linear algebra

1.2 Examples of vectors and matrices

1.2.1 Design matrix

1.2.2 Grayscale images

2.1 A view of statistics (or data science)

using Pkg
Pkg.activate(pwd())
Pkg.instantiate()

  Activating project at `~/Documents/github.com/ucla-biostat-216/2022fall/slides/01-intro`

using GraphPlot, Graphs, ImageCore, ImageIO, ImageMagick, ImageShow, 
    LinearAlgebra, MatrixDepot, MLDatasets, QuartzImageIO, 
    RDatasets, StatsModels, TextAnalysis

┌ Info: verify download of index files...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/T9mnt/src/MatrixDepot.jl:118
┌ Info: reading database
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/T9mnt/src/download.jl:23
┌ Info: adding metadata...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/T9mnt/src/download.jl:67
┌ Info: adding svd data...
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/T9mnt/src/download.jl:69
┌ Info: writing database
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/T9mnt/src/download.jl:74
┌ Info: used remote sites are sparse.tamu.edu with MAT index and math.nist.gov with HTML index
└ @ MatrixDepot /Users/huazhou/.julia/packages/MatrixDepot/T9mnt/src/MatrixDepot.jl:120

Introduction¶

Subject of linear algebra¶

Vector $\mathbf{x} \in \mathbb{R}^{n}$: $$ \mathbf{x} = \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix}. $$
Matrix $\mathbf{X} = (x_{ij}) \in \mathbb{R}^{m \times n}$: $$ \mathbf{X} = \begin{pmatrix} x_{11} & \cdots & x_{1n} \\ \vdots & \ddots & \vdots \\ x_{m1} & \cdots & x_{mn} \end{pmatrix}. $$

Examples of vectors and matrices¶

Design matrix¶

In statistics, tabular data is often summarized by a predictor matrix or covariate matrix or design matrix or feature matrix, which is denoted by $\mathbf{X}$ by convention. Each row of the feature matrix is an observation, and each column is a covariate/measurement/feature.

The famous Fisher's Iris data:

# the famous Fisher's Iris data
# <https://en.wikipedia.org/wiki/Iris_flower_data_set>
iris = dataset("datasets", "iris")

We can turn a tabular data set into a feature matrix according to a model formula:

# use full dummy coding (one-hot coding) for categorical variable Species
iris_X = ModelMatrix(ModelFrame(
    @formula(1 ~ 1 + SepalLength + SepalWidth + PetalLength + PetalWidth + Species), 
    iris,
    contrasts = Dict(:Species => StatsModels.FullDummyCoding()))).m

150×8 Matrix{Float64}:
 1.0  5.1  3.5  1.4  0.2  1.0  0.0  0.0
 1.0  4.9  3.0  1.4  0.2  1.0  0.0  0.0
 1.0  4.7  3.2  1.3  0.2  1.0  0.0  0.0
 1.0  4.6  3.1  1.5  0.2  1.0  0.0  0.0
 1.0  5.0  3.6  1.4  0.2  1.0  0.0  0.0
 1.0  5.4  3.9  1.7  0.4  1.0  0.0  0.0
 1.0  4.6  3.4  1.4  0.3  1.0  0.0  0.0
 1.0  5.0  3.4  1.5  0.2  1.0  0.0  0.0
 1.0  4.4  2.9  1.4  0.2  1.0  0.0  0.0
 1.0  4.9  3.1  1.5  0.1  1.0  0.0  0.0
 1.0  5.4  3.7  1.5  0.2  1.0  0.0  0.0
 1.0  4.8  3.4  1.6  0.2  1.0  0.0  0.0
 1.0  4.8  3.0  1.4  0.1  1.0  0.0  0.0
 ⋮                        ⋮         
 1.0  6.0  3.0  4.8  1.8  0.0  0.0  1.0
 1.0  6.9  3.1  5.4  2.1  0.0  0.0  1.0
 1.0  6.7  3.1  5.6  2.4  0.0  0.0  1.0
 1.0  6.9  3.1  5.1  2.3  0.0  0.0  1.0
 1.0  5.8  2.7  5.1  1.9  0.0  0.0  1.0
 1.0  6.8  3.2  5.9  2.3  0.0  0.0  1.0
 1.0  6.7  3.3  5.7  2.5  0.0  0.0  1.0
 1.0  6.7  3.0  5.2  2.3  0.0  0.0  1.0
 1.0  6.3  2.5  5.0  1.9  0.0  0.0  1.0
 1.0  6.5  3.0  5.2  2.0  0.0  0.0  1.0
 1.0  6.2  3.4  5.4  2.3  0.0  0.0  1.0
 1.0  5.9  3.0  5.1  1.8  0.0  0.0  1.0

Grayscale images¶

Neural networks can classify handwritten digits in high accuracy. Each handwritten digit is represented by a grayscale image. The famous MNIST data set contains 60,000 training images and 10,000 test images. Each image is a $28 \times 28$ matrix:

# first training sample: image, digit label
# MNIST.traindata(1)
MNIST(split=:train)[1]

(features = Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; … ; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0], targets = 5)

# first training digit
X = MNIST(split=:train)[1][1]

28×28 Matrix{Float32}:
 0.0  0.0  0.0  0.0  0.0  0.0        …  0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.215686  0.533333   0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0        …  0.67451   0.992157   0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.886275  0.992157   0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.992157  0.992157   0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.992157  0.831373   0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.992157  0.529412   0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0        …  0.992157  0.517647   0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.956863  0.0627451  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0117647     0.521569  0.0        0.0  0.0  0.0
 ⋮                        ⋮          ⋱                       ⋮         
 0.0  0.0  0.0  0.0  0.0  0.494118      0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.533333      0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.686275      0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.101961      0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.65098    …  0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  1.0           0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.968627      0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.498039      0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0        …  0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.0       0.0        0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0           0.0       0.0        0.0  0.0  0.0

# apparently it's digit 5
convert2image(MNIST, X)

Color images¶

CIFAR-10 is a collection of 50,000 training images and 10,000 test images, each belonging to 1 of 10 mutually exclusive classes (frog, truck, ...). Each color image is represented by three channels: R (red), G (green), B (blue). Each channel is a $32 \times 32$ intensity matrix.

# 2nd training image in CIFAR10
X = CIFAR10(split=:train)[2].features

32×32×3 Array{Float32, 3}:
[:, :, 1] =
 0.603922  0.54902   0.54902   0.533333  …  0.686275   0.647059   0.639216
 0.494118  0.568627  0.545098  0.537255     0.611765   0.611765   0.619608
 0.411765  0.490196  0.45098   0.478431     0.603922   0.623529   0.639216
 0.4       0.486275  0.576471  0.517647     0.576471   0.513726   0.568627
 0.490196  0.588235  0.541176  0.592157     0.607843   0.368627   0.168627
 0.607843  0.596078  0.517647  0.709804  …  0.631373   0.4        0.0745098
 0.67451   0.682353  0.666667  0.796078     0.627451   0.423529   0.0784314
 0.705882  0.698039  0.698039  0.815686     0.654902   0.501961   0.290196
 0.556863  0.52549   0.670588  0.815686     0.647059   0.603922   0.52549
 0.435294  0.431373  0.752941  0.796078     0.596078   0.611765   0.466667
 0.415686  0.521569  0.858824  0.701961  …  0.639216   0.713726   0.431373
 0.427451  0.639216  0.917647  0.662745     0.643137   0.701961   0.388235
 0.482353  0.752941  0.898039  0.643137     0.521569   0.490196   0.239216
 ⋮                                       ⋱             ⋮          
 0.454902  0.556863  0.694118  0.717647  …  0.243137   0.137255   0.0235294
 0.4       0.376471  0.396078  0.47451      0.2        0.0823529  0.0392157
 0.372549  0.388235  0.396078  0.356863     0.172549   0.054902   0.0980392
 0.352941  0.372549  0.345098  0.368627     0.152941   0.0431373  0.2
 0.282353  0.34902   0.403922  0.356863     0.168627   0.054902   0.266667
 0.235294  0.313726  0.368627  0.301961  …  0.4        0.231373   0.352941
 0.219608  0.254902  0.254902  0.270588     0.215686   0.192157   0.454902
 0.301961  0.329412  0.32549   0.380392     0.12549    0.211765   0.52549
 0.368627  0.360784  0.352941  0.345098     0.0901961  0.317647   0.54902
 0.356863  0.376471  0.309804  0.298039     0.164706   0.403922   0.560784
 0.341176  0.301961  0.266667  0.25098   …  0.239216   0.482353   0.560784
 0.309804  0.278431  0.262745  0.278431     0.364706   0.513726   0.560784

[:, :, 2] =
 0.694118  0.627451  0.607843  0.576471  …  0.654902  0.603922   0.580392
 0.537255  0.6       0.572549  0.556863     0.603922  0.596078   0.580392
 0.407843  0.490196  0.45098   0.47451      0.627451  0.631373   0.611765
 0.396078  0.505882  0.6       0.521569     0.6       0.509804   0.529412
 0.513726  0.631373  0.588235  0.615686     0.6       0.345098   0.12549
 0.65098   0.643137  0.568627  0.756863  …  0.603922  0.360784   0.0352941
 0.745098  0.737255  0.721569  0.870588     0.639216  0.419608   0.054902
 0.780392  0.741176  0.741176  0.890196     0.678431  0.505882   0.266667
 0.611765  0.545098  0.690196  0.87451      0.647059  0.584314   0.490196
 0.470588  0.435294  0.764706  0.858824     0.596078  0.592157   0.431373
 0.419608  0.498039  0.854902  0.760784  …  0.635294  0.694118   0.396078
 0.407843  0.611765  0.913725  0.721569     0.639216  0.686275   0.364706
 0.47451   0.752941  0.929412  0.729412     0.541176  0.505882   0.247059
 ⋮                                       ⋱            ⋮          
 0.458824  0.560784  0.694118  0.717647  …  0.25098   0.145098   0.0235294
 0.396078  0.380392  0.4       0.482353     0.207843  0.0862745  0.0352941
 0.372549  0.396078  0.403922  0.368627     0.168627  0.0470588  0.0862745
 0.34902   0.376471  0.34902   0.380392     0.141176  0.0235294  0.176471
 0.27451   0.34902   0.403922  0.368627     0.172549  0.0509804  0.25098
 0.235294  0.317647  0.372549  0.313726  …  0.423529  0.25098    0.352941
 0.223529  0.262745  0.262745  0.282353     0.219608  0.192157   0.443137
 0.305882  0.337255  0.337255  0.392157     0.101961  0.188235   0.498039
 0.376471  0.372549  0.364706  0.356863     0.054902  0.282353   0.509804
 0.372549  0.388235  0.321569  0.305882     0.133333  0.364706   0.521569
 0.352941  0.313726  0.27451   0.258824  …  0.207843  0.447059   0.52549
 0.317647  0.286275  0.270588  0.286275     0.32549   0.47451    0.521569

[:, :, 3] =
 0.733333  0.662745  0.643137  0.607843  …  0.65098   0.501961   0.470588
 0.533333  0.603922  0.584314  0.572549     0.627451  0.509804   0.478431
 0.372549  0.462745  0.439216  0.47451      0.666667  0.556863   0.521569
 0.388235  0.517647  0.623529  0.545098     0.639216  0.498039   0.490196
 0.545098  0.678431  0.635294  0.654902     0.647059  0.372549   0.12549
 0.705882  0.686275  0.603922  0.776471  …  0.670588  0.407843   0.0470588
 0.823529  0.784314  0.745098  0.878431     0.705882  0.470588   0.0745098
 0.839216  0.768627  0.752941  0.901961     0.729412  0.537255   0.27451
 0.611765  0.537255  0.686275  0.882353     0.682353  0.596078   0.478431
 0.431373  0.4       0.741176  0.85098      0.627451  0.603922   0.419608
 0.384314  0.470588  0.85098   0.776471  …  0.666667  0.701961   0.388235
 0.4       0.611765  0.933333  0.768627     0.67451   0.701961   0.356863
 0.458824  0.733333  0.921569  0.745098     0.596078  0.529412   0.243137
 ⋮                                       ⋱            ⋮          
 0.403922  0.533333  0.698039  0.752941  …  0.301961  0.172549   0.0431373
 0.32549   0.333333  0.380392  0.486275     0.270588  0.117647   0.0470588
 0.298039  0.329412  0.360784  0.341176     0.223529  0.0705882  0.0862745
 0.309804  0.341176  0.321569  0.360784     0.184314  0.0352941  0.164706
 0.270588  0.337255  0.388235  0.345098     0.223529  0.0784314  0.262745
 0.239216  0.301961  0.337255  0.258824  …  0.478431  0.301961   0.396078
 0.211765  0.235294  0.211765  0.215686     0.25098   0.227451   0.478431
 0.282353  0.298039  0.27451   0.313726     0.113725  0.203922   0.521569
 0.329412  0.313726  0.294118  0.27451      0.054902  0.286275   0.533333
 0.278431  0.305882  0.25098   0.25098      0.141176  0.376471   0.545098
 0.278431  0.243137  0.215686  0.207843  …  0.223529  0.470588   0.556863
 0.27451   0.239216  0.215686  0.231373     0.356863  0.513726   0.564706

# is this a truck?
convert2image(CIFAR10, X)

Text data¶

Text data (webpage, blog, twitter) can be transformed to numeric matrices for statistical analysis as well. For example, the 29 State of the Union Addresses by U.S. presidents, from George W Bush in 1989 to Donald Trump in 2017, can be represented by a $29 \times 9610$ document term matrix, where each row stands for one speech and each column is a word that ever appears in these speeches. An entry $x_{ij}$ of the matrix counts the number of occurrences of word $j$ in speech $i$.

sotupath = joinpath(dirname(pathof(TextAnalysis)), "..", "test/data/sotu")
Base.Filesystem.readdir(sotupath)

29-element Vector{String}:
 "Bush_1989.txt"
 "Bush_1990.txt"
 "Bush_1991.txt"
 "Bush_1992.txt"
 "Bush_2001.txt"
 "Bush_2002.txt"
 "Bush_2003.txt"
 "Bush_2004.txt"
 "Bush_2005.txt"
 "Bush_2006.txt"
 "Bush_2007.txt"
 "Bush_2008.txt"
 "Clinton_1993.txt"
 ⋮
 "Clinton_1998.txt"
 "Clinton_1999.txt"
 "Clinton_2000.txt"
 "Obama_2009.txt"
 "Obama_2010.txt"
 "Obama_2011.txt"
 "Obama_2012.txt"
 "Obama_2013.txt"
 "Obama_2014.txt"
 "Obama_2015.txt"
 "Obama_2016.txt"
 "Trump_2017.txt"

crps = DirectoryCorpus(sotupath)
# Donald Trump 2017 SOTU address
text(crps[29])

"Thank you very much. Mr. Speaker, Mr. Vice President, Members of Congress, the First Lady of the United States, and citizens of America: Tonight, as we mark the conclusion of our celebration of Black History Month, we are reminded of our Nation's path towards civil righ" ⋯ 28704 bytes ⋯ "n me in dreaming big and bold, and daring things for our country. I am asking everyone watching tonight to seize this moment. Believe in yourselves, believe in your future, and believe, once more, in America.\n\nThank you, God bless you, and God bless the United States.\n"

standardize!(crps, StringDocument)
remove_case!(crps)
prepare!(crps, strip_punctuation)
update_lexicon!(crps)
update_inverse_index!(crps)
m = DocumentTermMatrix(crps)
D = dtm(m, :dense)

29×9610 Matrix{Int64}:
 3  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  1  0   0   0
 3  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  1  0   0   0
 1  0  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  1  1  0   0   0
 0  1  0  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  1  0  0   0   0
 2  8  0  1  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0   0   0
 0  0  0  0  0  0  0  0  0  0  0  0  5  …  0  1  0  0  0  0  0  0  0   0   0
 0  2  0  2  0  0  0  0  0  0  0  1  3     0  0  0  0  0  0  0  0  0   0   0
 0  2  1  0  0  0  0  0  0  0  0  0  5     0  0  0  0  0  0  0  0  0   0   0
 0  0  0  0  0  0  0  0  0  0  0  0  1     1  0  0  0  0  0  0  0  0   0   0
 0  1  0  0  0  0  0  0  0  0  0  0  2     1  0  1  0  1  0  0  0  0  67  31
 1  1  1  0  0  0  0  0  0  0  0  0  2  …  1  0  0  0  0  0  0  0  0   0   0
 1  0  1  0  0  1  0  0  0  0  0  0  0     0  0  0  0  1  0  0  0  0   0   0
 2  6  1  0  0  2  0  0  0  0  0  0  0     0  0  0  0  0  0  0  1  0   0   0
 ⋮              ⋮              ⋮        ⋱     ⋮              ⋮            
 0  3  1  3  0  2  0  0  0  0  1  0  1     0  1  0  1  0  0  0  1  0   0   0
 3  3  0  1  1  4  0  0  0  0  0  0  1     0  0  0  0  0  0  0  1  0   0   0
 1  7  0  2  1  3  0  0  0  0  0  0  0     0  0  0  0  0  0  0  1  0   0   0
 1  0  0  0  0  0  0  0  0  0  0  0  0  …  0  0  0  0  0  0  0  0  0   0   0
 3  3  1  0  2  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0  45   1
 1  0  1  1  1  2  0  0  0  0  0  0  1     0  0  0  0  0  0  0  0  0  47   1
 1  0  1  0  1  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0   3   0
 2  0  1  0  0  0  0  1  0  0  0  0  0     0  0  0  0  0  0  0  0  0  41   1
 2  0  1  0  0  0  2  0  0  0  0  0  0  …  0  1  0  0  0  0  0  0  0  62   7
 0  0  0  0  0  0  0  0  0  0  0  0  1     0  1  0  0  0  0  0  0  0   0   0
 0  2  0  0  1  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0   0   0
 2  0  2  0  0  0  0  0  0  0  0  0  0     0  0  0  0  0  0  0  0  0   0   0

m.terms

9610-element Vector{String}:
 "1"
 "10"
 "100"
 "1000"
 "10000"
 "100000"
 "1010"
 "102"
 "103"
 "104"
 "105"
 "108"
 "11"
 ⋮
 "zarfos"
 "zarqawi"
 "zero"
 "zeroemission"
 "zeros"
 "zimbabwe"
 "zion"
 "zone"
 "zones"
 "ј"
 "–"
 "…"

Networks¶

The world wide web (WWW) with $n$ webpages can be described by a connectivity matrix or adjacency matrix $\mathbf{A} \in \{0,1\}^{n \times n}$ with entry \begin{eqnarray*} a_{ij} = \begin{cases} 1 & \text{if page $i$ links to page $j$} \\ 0 & \text{otherwise} \end{cases}. \end{eqnarray*} According to Internet Live Stats, $n \approx 1.98$ billion now. The smaller SNP/web-Google data set contains a web of 916,428 pages.

mdinfo("SNAP/web-Google") |> show

# SNAP/web-Google

###### MatrixMarket matrix coordinate pattern general

---

  * UF Sparse Matrix Collection, Tim Davis
  * http://www.cise.ufl.edu/research/sparse/matrices/SNAP/web-Google
  * name: SNAP/web-Google
  * [Web graph from Google]
  * id: 2301
  * date: 2002
  * author: Google
  * ed: J. Leskovec
  * fields: name title A id date author ed kind notes
  * kind: directed graph

---

  * notes:
  * Networks from SNAP (Stanford Network Analysis Platform) Network Data Sets,
  * Jure Leskovec http://snap.stanford.edu/data/index.html
  * email jure at cs.stanford.edu
  * 
  * Google web graph
  * 
  * Dataset information
  * 
  * Nodes represent web pages and directed edges represent hyperlinks between them.
  * The data was released in 2002 by Google as a part of Google Programming
  * Contest.
  * 
  * Dataset statistics
  * Nodes   875713
  * Edges   5105039
  * Nodes in largest WCC    855802 (0.977)
  * Edges in largest WCC    5066842 (0.993)
  * Nodes in largest SCC    434818 (0.497)
  * Edges in largest SCC    3419124 (0.670)
  * Average clustering coefficient  0.6047
  * Number of triangles     13391903
  * Fraction of closed triangles    0.05523
  * Diameter (longest shortest path)    22
  * 90-percentile effective diameter    8.1
  * 
  * Source (citation)
  * 
  * J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. Community Structure in Large
  * Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters.
  * arXiv.org:0810.1355, 2008.
  * 
  * Google programming contest, 2002
  * http://www.google.com/programming-contest/
  * 
  * Files
  * File    Description
  * web-Google.txt.gz   Webgraph from the Google programming contest, 2002

---

916428 916428 5105039

md = mdopen("SNAP/web-Google")
md.A

916428×916428 SparseArrays.SparseMatrixCSC{Bool, Int64} with 5105039 stored entries:
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿

Here is a visulization of the SNAP/web-Google network

Such a directed graph can also be represented by an indicence matrix $\mathbf{B} \in \{-1,0,1\}^{m \times n}$ where $m$ is the number of verticies and $n$ is the number of edges. The entries of an incidence matrix are \begin{eqnarray*} b_{ij} = \begin{cases} -1 & \text{if edge $j$ starts at vertex $i$} \\ 1 & \text{if edge $j$ ends at vertex $i$} \\ 0 & \text{otherwise} \end{cases}. \end{eqnarray*}

Here is a directed graph with 4 nodes and 5 edges.

# a simple directed graph on GS p16
g = SimpleDiGraph(4)
add_edge!(g, 1, 2)
add_edge!(g, 1, 3)
add_edge!(g, 2, 3)
add_edge!(g, 2, 4)
add_edge!(g, 4, 3)
gplot(g, nodelabel=["x1", "x2", "x3", "x4"], edgelabel=["b1", "b2", "b3", "b4", "b5"])

# adjacency matrix A
convert(Matrix{Int64}, adjacency_matrix(g))

4×4 Matrix{Int64}:
 0  1  1  0
 0  0  1  1
 0  0  0  0
 0  0  1  0

# incidence matrix B
convert(Matrix{Int64}, incidence_matrix(g))

4×5 Matrix{Int64}:
 -1  -1   0   0   0
  1   0  -1  -1   0
  0   1   1   0   1
  0   0   0   1  -1

A view of statistics (or data science)¶

XKCD #1838

	SepalLength	SepalWidth	PetalLength	PetalWidth	Species
	Float64	Float64	Float64	Float64	Cat…
1	5.1	3.5	1.4	0.2	setosa
2	4.9	3.0	1.4	0.2	setosa
3	4.7	3.2	1.3	0.2	setosa
4	4.6	3.1	1.5	0.2	setosa
5	5.0	3.6	1.4	0.2	setosa
6	5.4	3.9	1.7	0.4	setosa
7	4.6	3.4	1.4	0.3	setosa
8	5.0	3.4	1.5	0.2	setosa
9	4.4	2.9	1.4	0.2	setosa
10	4.9	3.1	1.5	0.1	setosa
11	5.4	3.7	1.5	0.2	setosa
12	4.8	3.4	1.6	0.2	setosa
13	4.8	3.0	1.4	0.1	setosa
14	4.3	3.0	1.1	0.1	setosa
15	5.8	4.0	1.2	0.2	setosa
16	5.7	4.4	1.5	0.4	setosa
17	5.4	3.9	1.3	0.4	setosa
18	5.1	3.5	1.4	0.3	setosa
19	5.7	3.8	1.7	0.3	setosa
20	5.1	3.8	1.5	0.3	setosa
21	5.4	3.4	1.7	0.2	setosa
22	5.1	3.7	1.5	0.4	setosa
23	4.6	3.6	1.0	0.2	setosa
24	5.1	3.3	1.7	0.5	setosa
25	4.8	3.4	1.9	0.2	setosa
26	5.0	3.0	1.6	0.2	setosa
27	5.0	3.4	1.6	0.4	setosa
28	5.2	3.5	1.5	0.2	setosa
29	5.2	3.4	1.4	0.2	setosa
30	4.7	3.2	1.6	0.2	setosa
⋮	⋮	⋮	⋮	⋮	⋮