Chapter 6 Positive Definite Matrices
6.1 Minima, Maxima, and Saddle points
F ( x , y ) = 7 + 2 ( x + y ) 2 − y sin y − x 3 f ( x , y ) = 2 x 2 + 4 x y + y 2 F(x,y)=7+2(x+y)^2-y\sin y-x^3 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ f(x,y)=2x^2+4xy+y^2 F(x,y)=7+2(x+y)2−ysiny−x3 f(x,y)=2x2+4xy+y2
Does either F ( x , y ) F(x,y) F(x,y) or f ( x , y ) f(x,y) f(x,y) have a minimum at the point x = y = 0 x=y=0 x=y=0
Remark 3 The zero-order terms F ( 0 , 0 ) = 7 F(0, 0) = 7 F(0,0)=7 and f ( 0 , 0 ) = 0 f(0, 0)=0 f(0,0)=0 have no effect on the answer.
Remark 4 The linear terms give a necessary condition: To have any chance of a minimum, the first derivatives must vanish at
x
=
y
=
0
:
x=y=0:
x=y=0:
∂
F
∂
x
=
4
(
x
+
y
)
−
3
x
2
=
0
and
∂
F
∂
y
=
4
(
x
+
y
)
−
y
cos
y
−
sin
y
=
0
∂
f
∂
x
=
4
x
+
4
y
=
0
and
∂
f
∂
y
=
4
x
+
2
y
=
0.
All zero.
\frac {\partial F} {\partial x} = 4(x+y)-3x^2 = 0 \ \ \ \ \ \ \ \ \ \text{and} \ \ \ \ \ \ \ \ \ \ \frac {\partial F} {\partial y} = 4(x+y) - y \cos y - \sin y = 0 \\ \frac {\partial f} {\partial x} = 4x + 4y = 0 \ \ \ \ \ \ \ \ \ \ \text{and} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial f}{\partial y} = 4x+2y =0. \ \ \ \ \ \ \ \text{All zero.}
∂x∂F=4(x+y)−3x2=0 and ∂y∂F=4(x+y)−ycosy−siny=0∂x∂f=4x+4y=0 and ∂y∂f=4x+2y=0. All zero.
Remark 5 The second derivative at
(
0
,
0
)
(0,0)
(0,0) are decisive:
∂
2
F
∂
x
2
=
4
−
6
x
=
4
∂
2
f
∂
x
2
=
4
∂
2
F
∂
x
∂
y
=
∂
2
F
∂
y
∂
x
=
4
∂
2
f
∂
x
∂
y
=
∂
2
f
∂
y
∂
x
=
4
∂
2
F
∂
y
2
=
4
+
y
sin
y
−
2
cos
y
=
2
∂
2
f
∂
y
2
=
2
\frac {\partial^2F} {\partial x^2} = 4-6x = 4 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac {\partial^2 f} {\partial x^2} =4 \\ \frac {\partial^2 F} {\partial x \partial y} = \frac {\partial^2 F} {\partial y \partial x} = 4 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac{\partial^2 f}{\partial x \partial y} = \frac{\partial^2 f} {\partial y \partial x} =4 \\ \frac {\partial^2 F}{\partial y^2} = 4 +y\sin y -2 \cos y = 2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \frac {\partial^2 f}{\partial y^2 } = 2
∂x2∂2F=4−6x=4 ∂x2∂2f=4∂x∂y∂2F=∂y∂x∂2F=4 ∂x∂y∂2f=∂y∂x∂2f=4∂y2∂2F=4+ysiny−2cosy=2 ∂y2∂2f=2
Remark 6 The higher-degree term in
F
F
F have no effect on the question of a local minimum, but they can prevent it from being a global minimum.
Express
f
(
x
,
y
)
using squares
f
=
a
x
2
+
2
b
x
y
+
c
y
2
=
a
(
x
+
b
a
y
)
2
+
(
c
−
b
2
a
)
y
2
(2)
\begin{matrix} \text{Express $f(x,y)$ } \\ \text{using squares} \end{matrix} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ f= ax^2+2bxy +cy^2 = a(x+\frac bay)^2 +(c- \frac{b^2}a)y^2 \tag2
Express f(x,y) using squares f=ax2+2bxy+cy2=a(x+aby)2+(c−ab2)y2(2)
6A
a
x
2
+
2
b
x
y
+
c
y
2
ax^2+2bxy+cy^2
ax2+2bxy+cy2 is positive definite if and only if
a
>
0
a>0
a>0 and
a
c
>
b
2
.
ac>b^2.
ac>b2. Any
f
(
x
,
y
)
f(x,y)
f(x,y) has a minimum at a point where
∂
F
∂
x
=
∂
F
∂
y
=
0
\frac {\partial F} {\partial x} = \frac {\partial F} {\partial y} = 0
∂x∂F=∂y∂F=0 with
∂
F
2
∂
x
2
>
0
and
[
∂
F
2
∂
x
2
]
[
∂
F
2
∂
y
2
]
>
[
∂
F
2
∂
x
∂
y
]
2
(3)
\frac {\partial F^2}{\partial x^2} >0 \ \ \ \ \ \ \ \ \text{and} \ \ \ \ \ \ \left[ \begin{matrix} \frac {\partial F^2} {\partial x^2} \end{matrix} \right] \left[ \begin{matrix} \frac {\partial F^2} {\partial y^2} \end{matrix} \right] > \left[ \begin{matrix} \frac {\partial F^2} {\partial x \partial y} \end{matrix} \right]^2 \tag{3}
∂x2∂F2>0 and [∂x2∂F2][∂y2∂F2]>[∂x∂y∂F2]2(3)
Singular case
a
c
=
b
2
ac=b^2
ac=b2
Saddle Point a c < b 2 ac<b^2 ac<b2
Higher Dimensions: Linear Algebra
Calculus would be enough to find our conditions F x x > 0 F_{xx}>0 Fxx>0 and F x x F y y > F x y 2 F_{xx}F_{yy}>F_{xy}^2 FxxFyy>Fxy2 for a minimum.
A quadratic
f
(
x
,
y
)
f(x,y)
f(x,y) comes directly from a symmetric 2 by 2 matrix
x
T
A
x
in
R
2
a
x
2
+
2
b
x
y
+
c
y
2
=
[
x
y
]
[
a
b
b
c
]
[
x
y
]
(4)
\text{$x^TAx$ \ \ in $R^2$} \ \ \ \ \ \ ax^2+2bxy+cy^2 = \left[ \begin{matrix} x & y \end{matrix} \right] \left[ \begin{matrix} a & b \\ b & c \end{matrix} \right] \left[ \begin{matrix} x \\ y \end{matrix} \right] \tag4
xTAx in R2 ax2+2bxy+cy2=[xy][abbc][xy](4)
For any symmetric matrix
A
A
A, the product
x
T
A
x
x^TAx
xTAx is a pure quadratic form
f
(
x
1
,
…
,
x
n
)
f(x_1, \dots, x_n)
f(x1,…,xn):
x
T
A
x
in
R
n
[
x
1
x
2
…
x
n
]
[
a
11
a
12
…
a
1
n
a
21
a
22
…
a
2
n
…
…
…
…
a
n
1
a
n
2
…
a
n
n
]
[
x
1
x
2
…
x
n
]
=
∑
i
=
1
n
∑
j
=
1
n
a
i
j
x
i
x
j
(5)
\text{$x^TAx$ in $R^n$ } \ \ \ \ \ \ \ \ \ \ \ \left[ \begin{matrix} x_1 & x_2 & \dots & x_n \end{matrix} \right] \left[ \begin{matrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \dots & \dots & \dots & \dots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{matrix} \right] \left[ \begin{matrix} x_1 \\ x_2 \\ \dots \\ x_n \end{matrix} \right] = \sum^n_{i = 1} \sum^n_{j=1} a_{ij}x_ix_j \tag5
xTAx in Rn [x1x2…xn]⎣⎢⎢⎡a11a21…an1a12a22…an2…………a1na2n…ann⎦⎥⎥⎤⎣⎢⎢⎡x1x2…xn⎦⎥⎥⎤=i=1∑nj=1∑naijxixj(5)
The
f
=
a
11
x
1
2
+
2
a
12
x
1
x
2
+
⋯
+
a
n
n
x
n
2
f=a_{11}x_1^2 + 2 a_{12}x_1x_2 + \dots + a_{nn}x_n^2
f=a11x12+2a12x1x2+⋯+annxn2
Then
F
F
F has a minimum when the pure quadratic
x
T
A
x
x^TAx
xTAx is positive definite.
Taylor series
F
(
x
)
=
F
(
0
)
+
x
T
(
grad
F
)
+
1
2
x
T
A
x
+
higher order terms
\text{Taylor series } \ \ \ F(x) = F(0)+x^T(\text{grad}\ F) + \frac 1 2 x^TAx + \text{higher order terms}
Taylor series F(x)=F(0)+xT(grad F)+21xTAx+higher order terms
6.2 Tests for Positive Definiteness
6B Each of the following tests is a necessary and sufficient condition for the real symmetric matrix A A A to be positive definite:
(I) x T k x > 0 x^Tkx>0 xTkx>0 for all nonzero real vectors x x x.
(II) All the eigenvalues of A A A satisfy λ i > 0 \lambda_i >0 λi>0.
(III) All the upper left submatrices A k A_k Ak have positive determinants.
(IV) All the pivots (without row exchanges) satisfy d k > 0 d_k>0 dk>0.
For rectangular matrix R R R with m m m equations with m ≥ n m \geq n m≥n , the least-squares problem R x = b Rx=b Rx=b.
The least-square choice x ^ \hat x x^ is the solution of R T R x ^ = R T b R^TR\hat x = R^Tb RTRx^=RTb. That matrix A = R T R A= R^TR A=RTR is not only symmetric but positive definite.
6C The symmetric matrix A A A is positive definite if and only if
(V) There is a matrix R with independent columns such that A = R T R A=R^TR A=RTR.
Semidefinite Matrices
6D Each of the following tests is a necessary and sufficient condition for a symmetric matrix A A A to be positive semidefinite:
(I’) x T A x ≥ 0 x^TAx \geq 0 xTAx≥0 for all vectors x x x (this defines positive semidefinite)
(II’) All the eigenvalues of A A A satisfy λ i ≥ 0 \lambda_i \geq 0 λi≥0
(III’) No principal submatrices have negative determinants
(IV’) No pivots are negative.
(V’) There is a matrix R R R, possibly with dependent columns, such that A = R T R A=R^TR A=RTR
6.3 Singular Value Decomposition
Singular Value Decomposition: Any
m
m
m by
n
n
n matrix
A
A
A can be factored into
A
=
U
Σ
V
T
=
(
orthogonal
)
(
diagonal
)
(
orthogonal
)
A = U\Sigma V^T = (\text{orthogonal})(\text{diagonal})(\text{orthogonal})
A=UΣVT=(orthogonal)(diagonal)(orthogonal)
The columns of
U
U
U (
m
m
m by
m
m
m) are eigenvectors of
A
A
T
AA^T
AAT, and the columns of
V
V
V (
n
n
n by
n
n
n) are eigenvectors of
A
T
A
A^TA
ATA. The
r
r
r singular values on the diagonal of
Σ
\Sigma
Σ(
m
m
m by
n
n
n) are the square roots of the nonzero eigenvalues of both
A
A
T
AA^T
AAT and
A
T
A
A^TA
ATA.
Remark 1. For positive definite matrices, Σ \Sigma Σ is Λ \Lambda Λ and U Σ V T U\Sigma V^T UΣVT is identical to Q Λ Q T Q\Lambda Q^T QΛQT. For other symmetric matrices, any negative eigenvalues in Λ \Lambda Λ become positive in Σ \Sigma Σ. For complex matrices, Σ \Sigma Σ remains real but U U U and V V V become unitary.
Remark 2.
U
U
U and
V
V
V give orthonormal bases for all four fundamental subspaces:
first
r
columns of U: column space of
A
last
m
−
r
columns of U: column space of
A
first
r
columns of U: column space of
A
last
n
−
r
columns of U: column space of
A
\text{first } \ \ \ \ r \ \ \ \ \ \ \text{columns of U:} \ \ \ \ \ \text{column space of } A \\ \text{last } \ \ \ \ m - r \ \ \ \ \ \ \text{columns of U:} \ \ \ \ \ \text{column space of } A\\ \text{first } \ \ \ \ r \ \ \ \ \ \ \text{columns of U:} \ \ \ \ \ \text{column space of } A\\ \text{last } \ \ \ \ n - r \ \ \ \ \ \ \text{columns of U:} \ \ \ \ \ \text{column space of } A
first r columns of U: column space of Alast m−r columns of U: column space of Afirst r columns of U: column space of Alast n−r columns of U: column space of A
Remark 3. When
A
A
A multiplies a column
v
j
v_j
vj of
V
V
V, it produces
σ
j
\sigma_j
σj times a column of
U
U
U. That comes directly from
A
V
=
U
Σ
AV=U\Sigma
AV=UΣ, looked at a column at a time.
Remark 4. Eigenvectors of
A
A
T
AA^T
AAT and
A
T
A
A^TA
ATA must go into the columns of
U
U
U and
V
V
V.
A
A
T
=
(
U
Σ
V
T
)
(
V
Σ
T
U
T
)
=
U
Σ
Σ
T
U
T
and, simialarly,
A
T
A
=
V
Σ
T
Σ
V
T
(1)
AA^T=(U\Sigma V^T)(V\Sigma^T U^T) = U\Sigma \Sigma^TU^T \text{and, simialarly, } A^TA= V\Sigma^T\Sigma V^T \tag 1
AAT=(UΣVT)(VΣTUT)=UΣΣTUTand, simialarly, ATA=VΣTΣVT(1)
Remark 5. Here is the reason that
A
v
j
=
σ
j
u
j
Av_j = \sigma_ju_j
Avj=σjuj, start with
A
T
A
v
j
=
λ
j
2
v
j
A^TAv_j = \lambda_j^2v_j
ATAvj=λj2vj
Multiply by
A
A
A
T
A
v
j
=
σ
j
2
A
V
j
(2)
\text{Multiply by $A$ } \ \ \ \ \ \ \ \ \ \ \ \ \ \ AA^TAv_j = \sigma_j^2 A V_j \tag 2
Multiply by A AATAvj=σj2AVj(2)
4. Least Squares
For a rectangular system
A
x
=
b
Ax= b
Ax=b. the least-squares solution comes from the normal equations
A
T
A
x
^
=
A
T
b
A^TA\hat x = A^Tb
ATAx^=ATb. If
A
A
A has dependent columns the
A
T
A
A^TA
ATA is not invertible and
x
^
\hat x
x^ is not determined.
The optimal solution of
A
x
=
b
is the minimum length solution of
A
T
A
x
^
=
A
T
b
.
\text{The optimal solution of } Ax= b \text{ is the minimum length solution of } A^TA\hat x = A^Tb.
The optimal solution of Ax=b is the minimum length solution of ATAx^=ATb.