目标#

概述#

训练机器学习模型时的一个关键选择是选择哪种度量标准来衡量模型学习信号的有效性。这些度量标准对于比较训练好的模型在新相似数据上的泛化能力非常有用。

这种度量标准的选择是 AutoML 的关键组成部分,因为它定义了 AutoML 搜索将寻求优化的成本函数。在 EvalML 中,这些度量标准被称为 目标。AutoML 将在探索更多管道和参数时寻求最小化(或最大化)目标分数,并将利用管道评分的反馈来调整可用的超参数并继续搜索。因此,拥有一个能够代表模型将在预期应用领域中如何应用的目标函数至关重要。

EvalML 支持传统监督机器学习中的多种目标,包括回归问题的均方误差,以及分类问题的交叉熵ROC 曲线下面积。EvalML 还允许用户利用其领域专业知识定义自定义目标,以便 AutoML 可以搜索为用户问题提供最大价值的模型。

优化目标 vs 排序目标#

有许多用于评估模型性能的常用目标。然而,并非所有这些目标都应该用于优化 AutoMLSearch。考虑一下常用的目标 recall,它是真阳性数除以真阳性和假阴性数之和。如果模型没有假阴性,recall 最终会得到完美的 1 分。在自动优化过程中,模型可以利用这一点,在每种情况下都预测为正标签,从而生成一个完全无用但看起来性能很高的模型。然而,在模型训练完成后尝试评估性能时,这个目标仍然很有用。

由于这个潜在问题,我们定义了两种类型的目标:优化目标和排序目标。优化目标是可以用于 AutoMLSearch 中训练高性能模型的。排序目标可以在 AutoMLSearch 运行后使用,用于对模型性能进行排序或进行其他评估。这些包括所有优化指标,以及所有其他重要的、不用于优化的指标,例如 recall。

请注意,我们还定义了第三类目标,即非核心目标,它们是领域特定的,在使用前需要额外的配置。

优化目标#

使用 get_optimization_objectives 方法可以获取每种问题类型在 AutoMLSearch 中可用于优化的目标列表

[1]:
from evalml.objectives import get_optimization_objectives
from evalml.problem_types import ProblemTypes

for objective in get_optimization_objectives(ProblemTypes.BINARY):
    print(objective.name)
MCC Binary
Log Loss Binary
Gini
AUC
Precision
F1
Balanced Accuracy Binary
Accuracy Binary

排序目标#

使用 get_ranking_objectives 方法可以获取 EvalML 中包含的每种问题类型的所有目标列表

[2]:
from evalml.objectives import get_ranking_objectives

for objective in get_ranking_objectives(ProblemTypes.BINARY):
    print(objective.name)
MCC Binary
Log Loss Binary
Gini
AUC
Recall
Precision
F1
Balanced Accuracy Binary
Accuracy Binary

EvalML 为每种问题类型定义了一个基础目标类:RegressionObjectiveBinaryClassificationObjectiveMulticlassClassificationObjective。所有 EvalML 目标都是这些类中的一个子类。

二元分类目标与阈值#

所有二元分类目标都具有 threshold 属性。一些二元分类目标,如对数损失 (log loss) 和 AUC,不受二元分类阈值选择的影响,因为它们基于预测概率进行评分或检查一系列阈值。这些指标的 score_needs_proba 设置为 False。对于所有其他二元分类目标,我们可以从预测概率和目标值计算出最优二元分类阈值。

[3]:
from evalml.pipelines import BinaryClassificationPipeline
from evalml.demos import load_fraud
from evalml.objectives import F1

X, y = load_fraud(n_rows=100)
X.ww.init(
    logical_types={
        "provider": "Categorical",
        "region": "Categorical",
        "currency": "Categorical",
        "expiration_date": "Categorical",
    }
)
objective = F1()
pipeline = BinaryClassificationPipeline(
    component_graph=[
        "Imputer",
        "DateTime Featurizer",
        "One Hot Encoder",
        "Random Forest Classifier",
    ]
)
pipeline.fit(X, y)
print(pipeline.threshold)
print(pipeline.score(X, y, objectives=[objective]))

y_pred_proba = pipeline.predict_proba(X)[True]
pipeline.threshold = objective.optimize_threshold(y_pred_proba, y)
print(pipeline.threshold)
print(pipeline.score(X, y, objectives=[objective]))
             Number of Features
Boolean                       1
Categorical                   6
Numeric                       5

Number of training examples: 100
Targets
False    91.00%
True      9.00%
Name: count, dtype: object
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
  pd.to_datetime(
None
OrderedDict([('F1', 1.0)])
0.37522383702653417
OrderedDict([('F1', 1.0)])

自定义目标#

通常,目标函数对于特定的用例或业务问题非常具体。要获得正确的优化目标,需要仔细考虑将使用模型采取的决策或行动,并根据训练数据中的已知结果,为正确或错误地执行这些决策/行动分配成本/收益。

确定了业务目标后,您可以通过定义自定义目标函数来将其提供给 EvalML 进行优化。

定义自定义目标函数#

要创建自定义目标类,我们必须定义几个元素

  • name: 此目标的可打印名称。

  • objective_function: 此函数接受预测值、真实标签以及对输入的可选引用,并返回衡量模型性能得分。

  • greater_is_better: 如果较高的 objective_function 值表示更好的解决方案,则为 True,否则为 False

  • score_needs_proba: 仅适用于分类目标。如果目标旨在处理预测概率而非预测值(例如:分类器的交叉熵),则为 True

  • decision_function: 仅适用于二元分类目标。此函数接受模型输出的预测概率和二元分类阈值,并返回预测值。

  • perfect_score: 完美模型在此目标上达到的得分。

  • expected_range: 我们希望此目标输出的期望值范围,不一定等于可能的取值范围。例如,我们期望的 R2 范围是 [-1, 1],尽管实际范围是 (-inf, 1]

示例:欺诈检测#

为了提供一个具体示例,让我们看看欺诈检测目标函数是如何构建的。

[4]:
from evalml.objectives.binary_classification_objective import (
    BinaryClassificationObjective,
)
import pandas as pd


class FraudCost(BinaryClassificationObjective):
    """Score the percentage of money lost of the total transaction amount process due to fraud"""

    name = "Fraud Cost"
    greater_is_better = False
    score_needs_proba = False
    perfect_score = 0.0

    def __init__(
        self,
        retry_percentage=0.5,
        interchange_fee=0.02,
        fraud_payout_percentage=1.0,
        amount_col="amount",
    ):
        """Create instance of FraudCost

        Args:
            retry_percentage (float): What percentage of customers that will retry a transaction if it
                is declined. Between 0 and 1. Defaults to .5

            interchange_fee (float): How much of each successful transaction you can collect.
                Between 0 and 1. Defaults to .02

            fraud_payout_percentage (float): Percentage of fraud you will not be able to collect.
                Between 0 and 1. Defaults to 1.0

            amount_col (str): Name of column in data that contains the amount. Defaults to "amount"
        """
        self.retry_percentage = retry_percentage
        self.interchange_fee = interchange_fee
        self.fraud_payout_percentage = fraud_payout_percentage
        self.amount_col = amount_col

    def decision_function(self, ypred_proba, threshold=0.0, X=None):
        """Determine if a transaction is fraud given predicted probabilities, threshold, and dataframe with transaction amount

        Args:
            ypred_proba (pd.Series): Predicted probablities
            X (pd.DataFrame): Dataframe containing transaction amount
            threshold (float): Dollar threshold to determine if transaction is fraud

        Returns:
            pd.Series: Series of predicted fraud labels using X and threshold
        """
        if not isinstance(X, pd.DataFrame):
            X = pd.DataFrame(X)

        if not isinstance(ypred_proba, pd.Series):
            ypred_proba = pd.Series(ypred_proba)

        transformed_probs = ypred_proba.values * X[self.amount_col]
        return transformed_probs > threshold

    def objective_function(self, y_true, y_predicted, X):
        """Calculate amount lost to fraud per transaction given predictions, true values, and dataframe with transaction amount

        Args:
            y_predicted (pd.Series): predicted fraud labels
            y_true (pd.Series): true fraud labels
            X (pd.DataFrame): dataframe with transaction amounts

        Returns:
            float: amount lost to fraud per transaction
        """
        if not isinstance(X, pd.DataFrame):
            X = pd.DataFrame(X)

        if not isinstance(y_predicted, pd.Series):
            y_predicted = pd.Series(y_predicted)

        if not isinstance(y_true, pd.Series):
            y_true = pd.Series(y_true)

        # extract transaction using the amount columns in users data
        try:
            transaction_amount = X[self.amount_col]
        except KeyError:
            raise ValueError("`{}` is not a valid column in X.".format(self.amount_col))

        # amount paid if transaction is fraud
        fraud_cost = transaction_amount * self.fraud_payout_percentage

        # money made from interchange fees on transaction
        interchange_cost = (
            transaction_amount * (1 - self.retry_percentage) * self.interchange_fee
        )

        # calculate cost of missing fraudulent transactions
        false_negatives = (y_true & ~y_predicted) * fraud_cost

        # calculate money lost from fees
        false_positives = (~y_true & y_predicted) * interchange_cost

        loss = false_negatives.sum() + false_positives.sum()

        loss_per_total_processed = loss / transaction_amount.sum()

        return loss_per_total_processed