ML:机器学习可解释性之局部图之每个特征如何影响您的预测?
目录
机器学习局部图之每个特征如何影响您的预测?
Partial Dependence Plots
How it Works
Code Example
机器学习可解释性之局部图之每个特征如何影响您的预测?
Partial Dependence Plots
While feature importance shows what variables most affect predictions, partial dependence plots show how a feature affects predictions. This is useful to answer questions like: (1)、Controlling for all other house features, what impact do longitude and latitude have on home prices? To restate this, how would similarly sized houses be priced in different areas? (2)、Are predicted health differences between two groups due to differences in their diets, or due to some other factor? If you are familiar with linear or logistic regression models, partial dependence plots can be interpreted similarly to the coefficients in those models. Though, partial dependence plots on sophisticated models can capture more complex patterns than coefficients from simple models. If you aren't familiar with linear or logistic regressions, don't worry about this comparison. We will show a couple examples, explain the interpretation of these plots, and then review the code to create these plots. | 虽然特征重要性显示了哪些变量对预测的影响最大,但局部依赖图显示了特征如何影响预测。 这对于回答以下问题很有用: (1)、控制所有其他房屋特征,经度和纬度对房价有什么影响? 重申一下,同样大小的房屋在不同地区如何定价? (2)、预测的两组之间的健康差异是由于他们的饮食差异,还是由于其他一些因素? 如果您熟悉线性或逻辑回归模型,则可以将局部依赖图解释为类似于这些模型中的系数。 不过,复杂模型的局部依赖图可以捕捉到比简单模型的系数更复杂的模式。 如果您不熟悉线性或逻辑回归,请不要担心这种比较。 我们将展示几个示例,解释这些图的解释,然后查看创建这些图的代码。 |
How it Works
Like permutation importance, partial dependence plots are calculated after a model has been fit. The model is fit on real data that has not been artificially manipulated in any way. In our soccer example, teams may differ in many ways. How many passes they made, shots they took, goals they scored, etc. At first glance, it seems difficult to disentangle the effect of these features. To see how partial plots separate out the effect of each feature, we start by considering a single row of data. For example, that row of data might represent a team that had the ball 50% of the time, made 100 passes, took 10 shots and scored 1 goal. We will use the fitted model to predict our outcome (probability their player won "man of the match"). But we repeatedly alter the value for one variable to make a series of predictions. We could predict the outcome if the team had the ball only 40% of the time. We then predict with them having the ball 50% of the time. Then predict again for 60%. And so on. We trace out predicted outcomes (on the vertical axis) as we move from small values of ball possession to large values (on the horizontal axis). In this description, we used only a single row of data. Interactions between features may cause the plot for a single row to be atypical. So, we repeat that mental experiment with multiple rows from the original dataset, and we plot the average predicted outcome on the vertical axis. | 与置换重要性一样,局部依赖图是在模型拟合后计算的。该模型适用于未以任何方式人为操纵的真实数据。 在我们的足球示例中,球队可能在许多方面有所不同。他们的传球次数、射门次数、进球数等等。乍一看,似乎很难理清这些特征的影响。 为了了解局部图如何区分每个特征的影响,我们首先考虑单行数据。例如,这行数据可能代表一支球队50%的时间有球,100次传球,10次射门和1次进球。 我们将使用拟合模型来预测我们的结果(他们的球员赢得“最佳球员”的概率)。但是我们反复改变一个变量的值来做出一系列预测。如果球队的控球率只有 40%,我们可以预测结果。然后,我们预测他们有 50% 的时间有球权。然后再次预测 60%。等等。当我们从控球的小值转移到大的值(在横轴上)时,我们追踪预测的结果(在纵轴上)。 在这个描述中,我们只使用了一行数据。特征之间的交互可能会导致单行图不典型。因此,我们对原始数据集中的多行重复这个心理实验,并在纵轴上绘制出预测结果的平均值。 |
Code Example
Model building isn't our focus, so we won't focus on the data exploration or model building code. Our first example uses a decision tree, which you can see below. In practice, you'll use more sophistated models for real-world applications. As guidance to read the tree: Leaves with children show their splitting criterion on the top The pair of values at the bottom show the count of False values and True values for the target respectively, of data points in that node of the tree. | 模型构建不是我们的重点,因此写代码时,不会专注于数据探索或模型构建。 我们的第一个示例使用决策树,您可以在下面看到。在实践中,您将在实际应用中使用更复杂的模型。 作为阅读树的指导: 有孩子的叶子在顶部显示他们的分裂标准 |
A few items are worth pointing out as you interpret this plot The y axis is interpreted as change in the prediction from what it would be predicted at the baseline or leftmost value. A blue shaded area indicates level of confidence From this particular graph, we see that scoring a goal substantially increases your chances of winning "Man of The Match." But extra goals beyond that appear to have little impact on predictions. Here is another example plot: 2D Partial Dependence Plots If you are curious about interactions between features, 2D partial dependence plots are also useful. An example may clarify this. We will again use the Decision Tree model for this graph. It will create an extremely simple plot, but you should be able to match what you see in the plot to the tree itself. | 底部的一对值分别显示了该树节点中目标数据点的False值和True值的计数。 在解释此图时,有几点值得指出 y 轴被解释为预测值相对于基线或最左边值的预测值的变化。 蓝色阴影区域表示置信水平 从这张特殊的图表中,我们看到进球大大增加了您赢得“最佳球员”的机会。但除此之外的额外目标似乎对预测几乎没有影响。 这是另一个示例图: 2D 局部依赖图 如果您对特征之间的交互感到好奇,2D局部依赖图也很有用。一个例子可以说明这一点。 对于这个图,我们将再次使用决策树模型。它将创建一个非常简单的图,但您应该能够将您在图中看到的内容与树本身相匹配。 |
This graph shows predictions for any combination of Goals Scored and Distance covered. For example, we see the highest predictions when a team scores at least 1 goal and they run a total distance close to 100km. If they score 0 goals, distance covered doesn't matter. Can you see this by tracing through the decision tree with 0 goals? But distance can impact predictions if they score goals. Make sure you can see this from the 2D partial dependence plot. Can you see this pattern in the decision tree too? This graph seems too simple to represent reality. But that's because the model is so simple. You should be able to see from the decision tree above that this is representing exactly the model's structure. You can easily compare the structure or implications of different models. Here is the same plot with a Random Forest model. | 此图显示了对进球数和覆盖距离的任意组合的预测。 例如,我们看到的最高预测是当一个球队至少进了1球,他们的总距离接近100公里。如果他们打进0球,距离就不重要了。你能通过追踪零目标的决策树看到这一点吗? 但如果他们进球,距离会影响预测。确保您可以从2D局部依赖图中看到这一点。你也能在决策树中看到这种模式吗? 这张图看起来太简单了,无法表示现实。但那是因为模型非常简单。您应该能够从上面的决策树中看到,这正是模型的结构。 |
This model thinks you are more likely to win Man of the Match if your players run a total of 100km over the course of the game. Though running much more causes lower predictions. In general, the smooth shape of this curve seems more plausible than the step function from the Decision Tree model. Though this dataset is small enough that we would be careful in how we interpret any model. | 您可以轻松比较不同模型的结构或含义。这是与随机森林模型相同的图。 该模型认为,如果您的球员在比赛过程中总共跑了100公里,那么您更有可能赢得最佳球员。虽然跑步越多,预测就越低。 一般来说,这条曲线的平滑形状似乎比决策树模型中的阶跃函数更合理。尽管这个数据集足够小,我们在解释任何模型时都要小心谨慎。 |