1 Odds and Odds ratio
For binominal distribution X ∼ B i n ( N , p ) X \sim Bin(N,p) X∼Bin(N,p), Y = X N Y = \frac{X}{N} Y=NX
μ = E ( Y ) = E ( X ) N = p = − c ′ ( p ) b ′ ( p ) = − − N / ( 1 − p ) N / p ( 1 − p ) \mu = E(Y) = \frac{E(X)}{N}=p=- \frac{c'(p)}{b'(p)}=-\frac{-N/(1-p)}{N/p(1-p)} μ=E(Y)=NE(X)=p=−b′(p)c′(p)=−N/p(1−p)−N/(1−p)
g ( p ) = l o g i t ( p ) = l o g ( p 1 − p ) g(p) = logit(p)=log(\frac{p}{1-p}) g(p)=logit(p)=log(1−pp) -----------Logit Link Function
Link function for binary data
- Logit Link: h ( p ) = l o g ( p 1 − p ) h(p) =log(\frac{p}{1-p}) h(p)=log(1−pp)
- Probit Link: h ( p ) = ϕ − 1 ( p ) h(p) = \mathbb{\phi}^{-1}(p) h(p)=ϕ−1(p), where ϕ \mathbb{\phi} ϕ is c.d.f of N ( 0 , 1 ) N(0,1) N(0,1)
- Log-log Link: h ( p ) = − l o g ( − l o g ( p ) ) h(p) = -log(-log(p)) h(p)=−log(−log(p))
- Complementray log-log Link: h ( p ) = − l o g ( − l o g ( 1 − p ) ) h(p) = -log(-log(1-p)) h(p)=−log(−log(1−p))
Odds Definition
O d d s = p 1 − p Odds = \frac{p}{1-p} Odds=1−pp, p p p is the probability of the outcome of interest, p = O d d s 1 + O d d s p = \frac{Odds}{1+Odds} p=1+OddsOdds
In logistic regression,
log odds: l o g ( O d d s ) = l o g ( p 1 − p ) = x T β log(Odds)=log(\frac{p}{1-p}) = x^\mathsf{T}\beta log(Odds)=log(1−pp)=xTβ
log odds ratios
β
\beta
β:
When compare the two coefficient of a factor
β
=
l
o
g
(
p
1
1
−
p
1
)
−
l
o
g
(
p
2
1
−
p
2
)
=
l
o
g
(
O
d
d
s
1
O
d
d
s
2
)
\beta = log(\frac{p_1}{1-p_1})-log(\frac{p_2}{1-p_2}) \\ =log(\frac{Odds_1}{Odds_2})
β=log(1−p1p1)−log(1−p2p2)=log(Odds2Odds1)
odds ratios
e
x
p
(
β
)
exp(\beta)
exp(β):
since
O
d
d
s
2
=
e
x
p
(
β
)
O
d
d
s
1
Odds_2=exp(\beta) Odds_1
Odds2=exp(β)Odds1, we also call
exp
(
β
)
\exp(\beta)
exp(β) as odds multiplier.
2 Is it good fit?
For GLM: Deviance D D D
For logistic regression (binomial models):
-
Deviance residuals
deviance residual :
d k = s i g n ( y k − n k p ^ k ) × [ 2 [ y k l o g ( y k n k p ^ k ) + ( n k − y k ) l o g ( n k − y k n k − n k p ^ k ) ] ] 1 2 . d_k = sign(y_k-n_k\hat{p}_k) \times \left[2\left[y_k log(\frac{y_k}{n_k \hat{p}_k})+(n_k - y_k) log (\frac{n_k - y_k}{n_k - n_k \hat{p}_k}) \right]\right]^{\frac{1}{2}}. dk=sign(yk−nkp^k)×[2[yklog(nkp^kyk)+(nk−yk)log(nk−nkp^knk−yk)]]21.
standardised deviance residual:
r D K = d k 1 − h k r_{DK}=\frac{d_k}{\sqrt{1 - h_k}} rDK=1−hkdk
h k h_k hk is the leverage of the hat matrix.tips: The residuals are not informative if the response is binary of n k n_k nk is small for most covariate patterns, wouldn’t be useful for the outcome variable is binary and the predictor is continuous.
-
Pearson’ s chi-squared statistic
χ 2 = ∑ i = 1 n ( y i − n i p ^ i ) 2 n i p ^ i ( 1 − p ^ i ) , i = 1 , . . n \chi^2=\sum^{n}_{i=1} \frac{(y_i - n_i \hat p_i)^2}{n_i \hat p_i(1 - \hat {p}_i)}, ~~~ i = 1,..n χ2=i=1∑nnip^i(1−p^i)(yi−nip^i)2, i=1,..n -
Pearson residuals
Pearson or chi-squared residual:
X k = y k − n k p ^ k n k p ^ k ( 1 − p ^ k ) X_k = \frac{y_k-n_k\hat{p}_k}{\sqrt{n_k \hat{p}_k (1 - \hat{p}_k)}} Xk=nkp^k(1−p^k)yk−nkp^k
standardised Pearson residual:
r P K = X K 1 − h k r_{PK}=\frac{X_K}{\sqrt{1 - h_k}} rPK=1−hkXK
h k h_k hk is the leverage of the hat matrix. -
Likelihood ratio chi-squared statistic
C = 2 [ l ( p ^ ; y ) − l ( p ~ ; y ) ] , w h e r e p ~ = ∑ y i ∑ x i , p ^ i s u n d e r M L E C = 2[l(\hat {p} ; y) - l(\tilde{p};y)], ~~~ where ~ \tilde{p} = \frac{\sum y_i}{\sum x_i}, ~~~ \hat p ~ is ~ under ~ MLE C=2[l(p^;y)−l(p~;y)], where p~=∑xi∑yi, p^ is under MLE -
AIC
A I C = − 2 l ( p ^ ; y ) + 2 p AIC = -2l(\hat p;y)+2p AIC=−2l(p^;y)+2p
smaller for better -
BIC
B I C = − 2 l ( p ^ ; y ) + 2 p × l o g ( n u m b e r o f o b s e r v a t i o n s ) BIC = -2l(\hat p;y)+2p \times log(number ~ of ~ observations) BIC=−2l(p^;y)+2p×log(number of observations)
smaller for better