模型理解#

仅仅检查模型的性能指标不足以选择模型并将其用于生产环境。在开发机器学习算法时，理解模型在数据上的行为、检查影响其预测的关键因素以及考虑模型的不足之处非常重要。确定机器学习项目“成功”的含义首先取决于用户的领域专业知识。

EvalML 包含了多种用于理解模型的工具，从绘图工具到解释预测的方法。

** 在 Jupyter Notebook 和 Jupyter Lab 上使用绘图方法需要安装 ipywidgets。

** 如果在 Jupyter Lab 上绘图，需要安装 jupyterlab-plotly。要下载此项，请确保已安装 npm。

解释特征影响#

EvalML 包提供了多种方法来理解数据集中哪些特征对模型的输出有影响。我们可以通过特征重要性或排列重要性来研究这一点，并利用它们生成更易读的解释。

首先，让我们在一些数据上训练一个管道。

[1]:

import evalml
from evalml.pipelines import BinaryClassificationPipeline

X, y = evalml.demos.load_breast_cancer()

X_train, X_holdout, y_train, y_holdout = evalml.preprocessing.split_data(
    X, y, problem_type="binary", test_size=0.2, random_seed=0
)


pipeline_binary = BinaryClassificationPipeline(
    component_graph={
        "Label Encoder": ["Label Encoder", "X", "y"],
        "Imputer": ["Imputer", "X", "Label Encoder.y"],
        "Random Forest Classifier": [
            "Random Forest Classifier",
            "Imputer.x",
            "Label Encoder.y",
        ],
    }
)
pipeline_binary.fit(X_train, y_train)
print(pipeline_binary.score(X_holdout, y_holdout, objectives=["log loss binary"]))

         Number of Features
Numeric                  30

Number of training examples: 569
Targets
benign       62.74%
malignant    37.26%
Name: count, dtype: object

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(

OrderedDict([('Log Loss Binary', 0.1686746297113362)])

特征重要性#

我们可以获取结果管道中与每个特征关联的重要性

[2]:

pipeline_binary.feature_importance

[2]:

	特征	重要性
0	平均凹点	0.138857
1	最差周长	0.137780
2	最差凹点	0.117782
3	最差半径	0.100584
4	平均凹度	0.086402
5	最差面积	0.072027
6	平均周长	0.046500
7	最差凹度	0.043408
8	平均半径	0.037664
9	平均面积	0.033683
10	半径误差	0.025036
11	面积误差	0.019324
12	最差纹理	0.014754
13	最差紧密度	0.014462
14	平均纹理	0.013856
15	最差平滑度	0.013710
16	最差对称性	0.011395
17	周长误差	0.010284
18	平均紧密度	0.008162
19	平均平滑度	0.008154
20	最差分形维数	0.007034
21	分形维数误差	0.005502
22	紧密度误差	0.004953
23	平滑度误差	0.004728
24	纹理误差	0.004384
25	对称性误差	0.004250
26	平均分形维数	0.004164
27	凹度误差	0.004089
28	平均对称性	0.003997
29	凹点误差	0.003076

我们还可以创建特征重要性的条形图

[3]:

pipeline_binary.graph_feature_importance()

如果我们有一个线性模型，我们也可以通过简单地检查模型的系数来查看特征重要性。

[4]:

from evalml.model_understanding import get_linear_coefficients

pipeline_linear = BinaryClassificationPipeline(
    component_graph={
        "Label Encoder": ["Label Encoder", "X", "y"],
        "Imputer": ["Imputer", "X", "Label Encoder.y"],
        "Logistic Regression Classifier": [
            "Logistic Regression Classifier",
            "Imputer.x",
            "Label Encoder.y",
        ],
    }
)
pipeline_linear.fit(X_train, y_train)

get_linear_coefficients(pipeline_linear.estimator, features=X.columns)

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/sklearn/linear_model/_logistic.py:469: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.cn/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.cn/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(

[4]:

Intercept                 -0.339181
worst radius              -1.777283
mean radius               -1.674112
texture error             -0.740383
perimeter error           -0.288266
mean texture              -0.081338
radius error              -0.076170
mean perimeter            -0.069128
mean area                  0.002720
fractal dimension error    0.005759
smoothness error           0.006098
symmetry error             0.019005
mean fractal dimension     0.020053
worst area                 0.021615
concave points error       0.022536
compactness error          0.058227
mean smoothness            0.073213
concavity error            0.084693
mean symmetry              0.086924
worst fractal dimension    0.098952
area error                 0.115528
worst smoothness           0.126151
mean concave points        0.183110
worst texture              0.258570
worst symmetry             0.274830
worst perimeter            0.296383
mean compactness           0.308766
worst concave points       0.348138
mean concavity             0.423376
worst compactness          0.945473
worst concavity            1.189651
dtype: float64

排列重要性#

我们还可以计算和绘制管道的排列重要性。

[5]:

from evalml.model_understanding import calculate_permutation_importance

calculate_permutation_importance(
    pipeline_binary, X_holdout, y_holdout, "log loss binary"
)

[5]:

	特征	重要性
0	最差周长	0.063657
1	最差面积	0.045759
2	最差半径	0.041926
3	平均凹点	0.029325
4	最差凹点	0.021045
5	最差凹度	0.010105
6	最差纹理	0.010044
7	平均纹理	0.006178
8	平均对称性	0.005857
9	平均面积	0.004745
10	最差平滑度	0.003190
11	面积误差	0.003113
12	平均周长	0.002478
13	平均分形维数	0.001981
14	紧密度误差	0.001968
15	凹度误差	0.001947
16	纹理误差	0.000291
17	平滑度误差	-0.000206
18	平均平滑度	-0.000745
19	分形维数误差	-0.000835
20	最差紧密度	-0.002392
21	平均凹度	-0.003188
22	平均紧密度	-0.005377
23	半径误差	-0.006229
24	平均半径	-0.006870
25	最差分形维数	-0.007415
26	对称性误差	-0.008175
27	周长误差	-0.008980
28	凹点误差	-0.010415
29	最差对称性	-0.018645

[6]:

from evalml.model_understanding import graph_permutation_importance

graph_permutation_importance(pipeline_binary, X_holdout, y_holdout, "log loss binary")

人类可读的重要性#

通过使用 readable_explanation(pipeline)，我们可以生成对特征重要性或排列重要性更易于人类理解的说明。这会筛选出对模型输出影响最大的特征子集，并将它们分为对模型“重度”或“中度”影响。这些特征根据给定目标下的特征重要性或排列重要性进行选择。如果存在任何主动降低管道性能的特征，此函数会突出显示这些特征并建议移除。

请注意，排列重要性在原始输入特征上运行，而特征重要性在通过多次预处理步骤后传递给最终估计器的特征上运行。这两种方法会突出显示不同的重要特征，且特征名称也可能有所不同。

[7]:

from evalml.model_understanding import readable_explanation

readable_explanation(
    pipeline_binary,
    X_holdout,
    y_holdout,
    objective="log loss binary",
    importance_method="permutation",
)

Random Forest Classifier: The output as measured by log loss binary is heavily influenced by worst perimeter, and is somewhat influenced by worst area, worst radius, mean concave points, and worst concave points.
The features smoothness error, mean smoothness, fractal dimension error, worst compactness, mean concavity, mean compactness, radius error, mean radius, worst fractal dimension, symmetry error, perimeter error, concave points error, and worst symmetry detracted from model performance. We suggest removing these features.

[8]:

readable_explanation(
    pipeline_binary, importance_method="feature"
)  # feature importance doesn't require X and y

Random Forest Classifier: The output is somewhat influenced by mean concave points, worst perimeter, worst concave points, worst radius, and mean concavity.

我们可以通过 max_features 参数调整可见的最重要特征的数量，或通过 min_importance_threshold 修改“重要性”的最小阈值。但是，这些值不会影响显示的任何有害特征，因为此函数始终会显示所有有害特征。

模型理解的指标#

混淆矩阵#

对于二元或多类分类，我们可以查看分类器预测的混淆矩阵。在 confusion_matrix() 的 DataFrame 输出中，列标题表示预测标签，而行标题表示实际标签。

[9]:

from evalml.model_understanding.metrics import confusion_matrix

y_pred = pipeline_binary.predict(X_holdout)
confusion_matrix(y_holdout, y_pred)

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:

Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.

[9]:

	良性	恶性
良性	0.930556	0.069444
恶性	0.023810	0.976190

[10]:

from evalml.model_understanding.metrics import graph_confusion_matrix

y_pred = pipeline_binary.predict(X_holdout)
graph_confusion_matrix(y_holdout, y_pred)

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:

Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.

精确率-召回率曲线#

对于二元分类，我们可以查看管道的精确率-召回率曲线。

[11]:

from evalml.model_understanding.metrics import graph_precision_recall_curve

# get the predicted probabilities associated with the "true" label
import woodwork as ww

y_encoded = y_holdout.ww.map({"benign": 0, "malignant": 1})
y_pred_proba = pipeline_binary.predict_proba(X_holdout)["malignant"]
graph_precision_recall_curve(y_encoded, y_pred_proba)

ROC 曲线#

对于二元和多类分类，我们可以查看管道的受试者工作特征 (ROC) 曲线。

[12]:

from evalml.model_understanding.metrics import graph_roc_curve

# get the predicted probabilities associated with the "malignant" label
y_pred_proba = pipeline_binary.predict_proba(X_holdout)["malignant"]
graph_roc_curve(y_encoded, y_pred_proba)

ROC 曲线也可以为多类分类问题生成。对于多类问题，图表将显示每个类别的“一对多”ROC 曲线。

[13]:

from evalml.pipelines import MulticlassClassificationPipeline

X_multi, y_multi = evalml.demos.load_wine()

pipeline_multi = MulticlassClassificationPipeline(
    ["Simple Imputer", "Random Forest Classifier"]
)
pipeline_multi.fit(X_multi, y_multi)

y_pred_proba = pipeline_multi.predict_proba(X_multi)
graph_roc_curve(y_multi, y_pred_proba)

         Number of Features
Numeric                  13

Number of training examples: 178
Targets
class_1    39.89%
class_0    33.15%
class_2    26.97%
Name: count, dtype: object

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:

Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:

Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning:

Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.

可视化#

二元目标得分 vs. 阈值图#

一些二元分类目标（即 score_needs_proba 设置为 False 的目标）对决策阈值很敏感。对于这些目标，我们可以获取并绘制从零到一的阈值得分图，得分是在由 steps 确定的均匀间隔处计算得出的。

[14]:

from evalml.model_understanding.visualizations import binary_objective_vs_threshold

binary_objective_vs_threshold(pipeline_binary, X_holdout, y_holdout, "f1", steps=10)

[14]:

	阈值	得分
0	0.0	0.538462
1	0.1	0.811881
2	0.2	0.891304
3	0.3	0.901099
4	0.4	0.931818
5	0.5	0.931818
6	0.6	0.941176
7	0.7	0.951220
8	0.8	0.936709
9	0.9	0.923077
10	1.0	0.000000

[15]:

from evalml.model_understanding.visualizations import (
    graph_binary_objective_vs_threshold,
)

graph_binary_objective_vs_threshold(
    pipeline_binary, X_holdout, y_holdout, "f1", steps=100
)

回归问题的预测值 vs 实际值图#

我们还可以为回归问题创建一个散点图，比较预测值与实际值。我们可以指定一个 outlier_threshold，以便在实际值和预测值之间的绝对差超出给定阈值时，以不同的颜色标记这些值。

[16]:

from evalml.model_understanding.visualizations import graph_prediction_vs_actual
from evalml.pipelines import RegressionPipeline

X_regress, y_regress = evalml.demos.load_diabetes()
X_train_reg, X_test_reg, y_train_reg, y_test_reg = evalml.preprocessing.split_data(
    X_regress, y_regress, problem_type="regression"
)

pipeline_regress = RegressionPipeline(["One Hot Encoder", "Linear Regressor"])
pipeline_regress.fit(X_train_reg, y_train_reg)

y_pred = pipeline_regress.predict(X_test_reg)
graph_prediction_vs_actual(y_test_reg, y_pred, outlier_threshold=50)

         Number of Features
Numeric                  10

Number of training examples: 442
Targets
72     1.36%
200    1.36%
178    1.13%
71     1.13%
90     1.13%
       ...
136    0.23%
295    0.23%
79     0.23%
25     0.23%
195    0.23%
Name: count, Length: 214, dtype: object

树可视化#

现在让我们在一些数据上训练一个决策树。我们可以可视化拟合到该数据的决策树的结构，并在必要时保存它。

[17]:

pipeline_dt = BinaryClassificationPipeline(
    ["Simple Imputer", "Decision Tree Classifier"]
)
pipeline_dt.fit(X_train, y_train)

[17]:

pipeline = BinaryClassificationPipeline(component_graph={'Simple Imputer': ['Simple Imputer', 'X', 'y'], 'Decision Tree Classifier': ['Decision Tree Classifier', 'Simple Imputer.x', 'y']}, parameters={'Simple Imputer':{'impute_strategy': 'most_frequent', 'fill_value': None}, 'Decision Tree Classifier':{'criterion': 'gini', 'max_features': 'sqrt', 'max_depth': 6, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0}}, random_seed=0)

[18]:

from evalml.model_understanding.visualizations import visualize_decision_tree

visualize_decision_tree(
    pipeline_dt.estimator, max_depth=2, rotate=False, filled=True, filepath=None
)

[18]:

../_images/user_guide_model_understanding_35_0.svg

二元分类管道的混淆矩阵和阈值#

对于二元分类管道，EvalML 还提供了比较实际正例和实际负例直方图的功能，以及获取每个目标的混淆矩阵和理想阈值。

[19]:

from evalml.model_understanding import find_confusion_matrix_per_thresholds

df, objective_thresholds = find_confusion_matrix_per_thresholds(
    pipeline_binary, X, y, n_bins=10
)
df.head(10)

[19]:

	true_pos_count	true_neg_count	true_positives	true_negatives	false_positives	false_negatives	data_in_bins
0.1	1	309	211	309	48	1	[19, 20, 21, 37, 46]
0.2	0	35	211	344	13	1	[68, 92, 123, 133, 147]
0.3	0	5	211	349	8	1	[112, 157, 484, 491, 505]
0.4	0	3	211	352	5	1	[208, 340, 465]
0.5	0	0	211	352	5	1	[]
0.6	3	2	208	354	3	4	[40, 89, 128, 263, 297]
0.7	2	2	206	356	1	6	[13, 81, 385, 421]
0.8	9	1	197	357	0	15	[38, 41, 54, 73, 86]
0.9	15	0	182	357	0	30	[39, 44, 91, 99, 100]
1.0	182	0	0	357	0	212	[0, 1, 2, 3, 4]

[20]:

objective_thresholds

[20]:

{'accuracy': {'objective score': 0.9894551845342706, 'threshold value': 0.4},
 'balanced_accuracy': {'objective score': 0.9906387083135141,
  'threshold value': 0.4},
 'precision': {'objective score': 1.0, 'threshold value': 0.8},
 'f1': {'objective score': 0.9859813084112149, 'threshold value': 0.4}}

在上述结果中，第一个 dataframe 包含实际正例和负例的直方图，由 true_pos_count 和 true_neg_count 表示。列 true_positives、true_negatives、false_positives 和 false_negatives 包含相关阈值的混淆矩阵信息，而 data_in_bins 包含属于每个 bin 的行索引的随机子集（包括正例和负例）。dataframe 的索引表示相关阈值。例如，在索引 0.1 处，有 1 行正例和 309 行负例落在 [0.0, 0.1] 之间。

返回的 objective_thresholds 字典以目标度量为键，关联的字典值包含最佳目标得分以及导致该得分的阈值。

在低维空间可视化高维数据#

我们可以使用 T-SNE 在二维图上可视化具有许多特征的数据，从而更容易地看到数据中的关系。

[21]:

# Our data is highly dimensional, we can't plot this in a way we understand
print(len(X.columns))

[22]:

from evalml.model_understanding import graph_t_sne

fig = graph_t_sne(X)
fig

部分依赖图#

我们可以计算某个特征的单向部分依赖图。

[23]:

from evalml.model_understanding import partial_dependence

partial_dependence(
    pipeline_binary, X_holdout, features="mean radius", grid_resolution=5
)

[23]:

	feature_values	partial_dependence	class_label
0	9.69092	0.392453	恶性
1	12.40459	0.395962	恶性
2	15.11826	0.417396	恶性
3	17.83193	0.429542	恶性
4	20.54560	0.429717	恶性

[24]:

from evalml.model_understanding import graph_partial_dependence

graph_partial_dependence(
    pipeline_binary, X_holdout, features="mean radius", grid_resolution=5
)

我们还可以计算分类特征的部分依赖性。我们将在欺诈数据集上演示这一点。

[25]:

X_fraud, y_fraud = evalml.demos.load_fraud(100, verbose=False)
X_fraud.ww.init(
    logical_types={
        "provider": "Categorical",
        "region": "Categorical",
        "currency": "Categorical",
        "expiration_date": "Categorical",
    }
)

fraud_pipeline = BinaryClassificationPipeline(
    ["DateTime Featurizer", "One Hot Encoder", "Random Forest Classifier"]
)
fraud_pipeline.fit(X_fraud, y_fraud)

graph_partial_dependence(fraud_pipeline, X_fraud, features="provider")

/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: