Task

全文都在解决这一个Task: Given a network with labels on some nodes, how do we assign labels to all other nodes in the network?

本文涉及的是Probabilistic framework，不涉及深度学习。

用到的信息

Classification label of a node $v$ in network may depend on

Features of node $v$
Labels of the nodes in $v$ ’s neighborhood
Features of the nodes in $v$ ’s neighborhood

semi-supervised

给定了graph中部分node的label，要求预测剩余node的label

训练数据中只有一部分有lable就是半监督

task的应用

Document classification
Part of speech tagging
Link prediction
Optical character recognition
Image/3D data segmentation
Entity resolution in sensor networks
Spam and fraud detection

Markov Assumption

为了简化问题，假设一个node的状态只和与node直接相连的node的状态有关

Collective classification

这里讲Collective classification的意思是本文涉及的具体方法都属于Collective classification，都使用类似的步骤。

Simultaneous classification of interlinked nodes using correlations. 意思是同时给所有node分类就是Collective classification

做Collective classification需要如下 3 steps：

Local Classifier

作用：Used for initial label assignment

涉及的信息：only node attributes/features and Does not use network information

方法：Standard classification task
Relational Classifier

涉及的信息：用到了node和其 neighbors的label和feature

作用：用 correlations between nodes 给node重新分类

方法：也是一个分类器
Collective Inference

做的事：Apply relational classifier to each node iteratively，Propagate the correlation (将从node correlation中获取的信息一个接一个节点的传到到整个网络中)

Iterate until the inconsistency between neighboring labels is minimized

方法

方法的Intuition

Correlation: nearby nodes have the same color (belonging to the same class)

下面是node之间有Correlation的原因：

Homophily: The tendency of individuals to associate and bond with similar others，

例如：物以类聚，人以群分；Researchers who focus on the same research area are more likely to establish a connection；People with the same interest are more closely connected due to homophily
Influence: Social connections can influence the individual characteristics of a person.

例如：I recommend my musical preferences to my friends, until one of them grows to like my same favorite genres!

这个intuition体现在graph上就是：Similar nodes are typically close together or directly connected in the network。如果两个node直接连接或者距离近，那它们更可能相似。

Relational classifiers

这是一个非常简单的方法，只利用了node之间的邻接关系

Idea: Class probability $Y_v$ of node $v$ is a weighted average of class probabilities of its neighbors

步骤：

For labeled nodes $v$ , initialize label $Y_v$ with ground-truth label $Y_v^*$
For unlabeled nodes, initialize $Y_v = 0.5$
Update all nodes in a random order until convergence or until maximum number of iterations is reached
update node label的方法如下：
$P(Y_v = c)=\frac{\sum\limits_{(v,u) \in E} A_{v,u} P(Y_u = c)}{\sum\limits_{(v,u) \in E} A_{v,u}}$

其中 $\in E$ 表示节点 $v$ 的所有边 $(v, u)$ ， $A$ 为邻接矩阵， $A_{v,u}$ 可以是表示连接或不连接的1和0，也可以是边的权重

这一步要多次迭代graph的所有节点，一次迭代是给所有unlabeled node计算概率。
问题：权重从哪来？

Convergence is not guaranteed

实现

pytorch geometric 里一次循环的公式为：
$\alpha \cdot D^{-\frac{1}{2}} A D^{-\frac{1}{2}} Y + (1-\alpha) Y$
其中 $D^{-\frac{1}{2}}$

Iterative classification

利用了node neighbor’s label，以及 node feature/attributes

两个classifier

$\phi_1 (f_v)$ = Predict node label based on node feature vector $f_v$

作用是给unlabeled node指定一个相对靠谱的初始label，仅仅使用一次
$\phi_2 (f_v, z_v)$ = Predict label based on node feature vector $f_v$ and summary $z_v$ of labels of $v$ ’s neighbors.

$z_v$ 是 node $v$ 邻居的 label 的 summary，定义 $z_v$ 的方法可以有很多，核心是要对neighbor 的label 做summary，例如如下几种方法
1. Histogram(直方图) of the number (or fraction) of each label in $N_v$ （用一个向量表示 $N_v$ 中各个label的数量）
2. Most common label in $N_v$
这个分类器的作用和Relational classifiers里的计算邻接节点label的概率是一样的，区别是还利用了node feature

每一次迭代都要用到

方法

用labeled node 数据 train classifier $\phi_1 (f_v)$
用 $\phi_1 (f_v)$ 预测unlabeled node 的 $Y_v$
计算所有节点的 $z_v$
包括labeled node和unlabeled node，labeled和unlabeled node的neighbor都即可能是labeled，也可能是unlabeled
用labeled node的 $f_v$ 和 $z_v$ 训练分类器 $\phi_2 (f_v, z_v)$
用 $\phi_2 (f_v, z_v)$ 预测unlabeled node 的 $Y_v$
更新所有节点的 $z_v$
重复4、5、6直到 class labels stabilize or max number of iterations is reached

Convergence is not guaranteed

如果随着迭代不收敛，则定义一个最多的迭代次数，到了就停，迭代次数一般不是很大，可以选择选择10、50、100等

Loopy belief propagation

这个方法课程里只是粗略的讲解了，很多细节没讲清楚。这个方式是概率图模型，不涉及深度学习，有可能是一种比较早的方法，现在可能会有更好的方法，所以暂时先不看更多的细节，有需要了再回来看。

Loopy表示可能会应用在有环的图上

方法是用来计算graph的节点属于某个类别的概率

计算

belief

当迭代完成后可以按如下方法计算各个node属于各个label的belief

$Belief_b(Y_i)=\phi_b(Y_i)\prod\limits_{k\in N_b}m_{k \rightarrow b}(Y_i)$

$Belief_b(Y_i)$ 表示node b属于label $Y_i$ 的belief

Prior belief $\phi_b(Y_i)$ : 先验的belief，值的大小表示节点 $b$ 属于label $Y_i$ 的可能性，不是概率，值大小正比于概率

$m_{k \rightarrow b}(Y_i)$ : node k 传递给 node b的message，message代表了node k 对 node b属于label $Y_i$ 可能性的估计，可能性越大则值越大

neighbor对节点属于某个label的estimate越大，则belief越大
node的对某个label的prior belief越大，则belief 越大

message

计算node a 传递给 node b的message时，对所有 $Y_j \in \mathcal{L}$ ( $\mathcal{L}$ is the set of all classes/labels)，计算下式：
$\ b m k → a ( Y i ) m_{a \rightarrow b}(Y_j)=\sum\limits_{Y_i \in \mathcal{L}}\psi(Y_i, Y_j) \phi_a(Y_i) \prod\limits_{k\in N_a \backslash b}m_{k \rightarrow a}(Y_i)$

Label-label potential matrix $\psi$ :

一个size为(label_num, label_num)的矩阵，用来代表两个节点之间的label的条件概率，
$\psi(Y_i, Y_j)$ 大小代表了node的neighbor 的label是 $Y_i$ 的情况下node的label是 $Y_j$ 的可能性，
$\psi(Y_i, Y_j)$ 不是概率，但是值正比于概率，条件概率越大，值越大
$\psi$ 和具体的节点无关，表示的是两个label之间的条件概率
$\psi$ 对角线上的元素会很大，因为对角线元素表示两个node属于同一个类别，即邻居具有某个类别，自己也更有可能属于这个类别

$\ b k\in N_a \backslash b$ 表示除去b以外的node a的neighbor

理解：

$\psi(Y_i, Y_j)$ 、 $\phi_a(Y_i)$ 、 $\prod\limits_{k\in N_a}m_{k \rightarrow a}(Y_i)$ 三部分都不是概率，但是和概率成正比，
1. $\prod\limits_{k\in N_a}m_{k \rightarrow a}(Y_i)$ 和 $\phi_a(Y_i)$ 越大，node a属于label $Y_i$ 的可能性越大
2. $\psi(Y_i, Y_j)$ 越大node b属于label $Y_j$ 可能性越大
3. 三项合起来就是node a属于label $Y_i$ 的可能性以及由此带来的node b属于label $Y_j$ 可能性
$\sum\limits_{Y_i \in \mathcal{L}}$ 表示遍历node a属于所有可能label的情况，并将所有情况导致的node b属于label $Y_j$ 可能性求和，得到node a对于node b属于label $Y_j$ 的影响
一个node传给另一个node的message包括label num个值，表示对接收node属于各个label的belief 的estimate