理解数据检查操作#
EvalML 简化了表格数据的机器学习模型的创建和实现。它提供的众多功能之一是数据检查,这有助于我们在训练模型之前确定数据的“健康”状况。这些数据检查关联着相应的操作,将在本 Notebook 中展示。在默认数据检查中,我们有以下检查:
NullDataCheck
: 检查行或列是否为空或严重为空IDColumnsDataCheck
: 检查可能为 ID 列的列TargetLeakageDataCheck
: 检查任何输入特征是否与目标有高度关联InvalidTargetDataCheck
: 检查目标中是否存在空值或其他无效值NoVarianceDataCheck
: 检查目标或任何特征是否没有方差
EvalML 还有更多数据检查,可在此处查看此处,使用示例在此处此处。下面,我们将逐步介绍 EvalML 默认数据检查和操作的使用。
首先,我们导入演示这些检查所需的依赖项。
[1]:
import woodwork as ww
import pandas as pd
from evalml import AutoMLSearch
from evalml.demos import load_fraud
from evalml.preprocessing import split_data
让我们看看输入特征数据。EvalML 使用 Woodwork 库来表示这些数据。EvalML 返回的演示数据是 Woodwork DataTable 和 DataColumn。
[2]:
X, y = load_fraud(n_rows=1500)
X.head()
Number of Features
Boolean 1
Categorical 6
Numeric 5
Number of training examples: 1500
Targets
False 86.60%
True 13.40%
Name: count, dtype: object
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[2]:
card_id | store_id | datetime | amount | currency | customer_present | expiration_date | provider | lat | lng | region | country | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
0 | 32261 | 8516 | 2019-01-01 00:12:26 | 24900 | CUC | True | 08/24 | Mastercard | 38.58894 | -89.99038 | Fairview Heights | US |
1 | 16434 | 8516 | 2019-01-01 09:42:03 | 15789 | MYR | False | 11/21 | Discover | 38.58894 | -89.99038 | Fairview Heights | US |
2 | 23468 | 8516 | 2019-04-17 08:17:01 | 1883 | AUD | False | 09/27 | Discover | 38.58894 | -89.99038 | Fairview Heights | US |
3 | 14364 | 8516 | 2019-01-30 11:54:30 | 82120 | KRW | True | 09/20 | JCB 16 digit | 38.58894 | -89.99038 | Fairview Heights | US |
4 | 29407 | 8516 | 2019-05-01 17:59:36 | 25745 | MUR | True | 09/22 | American Express | 38.58894 | -89.99038 | Fairview Heights | US |
添加噪声和不干净的数据#
此数据已干净并与 EvalML 的 AutoMLSearch
兼容。为了演示 EvalML 默认数据检查,我们将添加以下内容:
一个大部分为空值的列(非空值 <0.5%)
一个低方差/无方差的列
一行空值
一个缺失的目标值
我们将前两列添加到整个数据集中,并将后两项仅添加到训练数据中。注意:这些仅代表 EvalML 默认数据检查能够捕获的一些场景。
[3]:
# add a column with no variance in the data
X["no_variance"] = [1 for _ in range(X.shape[0])]
# add a column with >99.5% null values
X["mostly_nulls"] = [None] * (X.shape[0] - 5) + [i for i in range(5)]
# since we changed the data, let's reinitialize the woodwork datatable
X.ww.init()
# let's split some training and validation data
X_train, X_valid, y_train, y_valid = split_data(X, y, problem_type="binary")
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[4]:
# make row 1 all nan values
X_train.iloc[1] = [None] * X_train.shape[1]
# make one of the target values null
y_train[990] = None
X_train.ww.init()
y_train = ww.init_series(y_train, logical_type="Categorical")
# Let's take another look at the new X_train data
X_train
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[4]:
card_id | store_id | datetime | amount | currency | customer_present | expiration_date | provider | lat | lng | region | country | no_variance | mostly_nulls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
872 | 15492 | 2868 | 2019-08-03 02:50:04 | 80719 | HNL | True | 08/27 | American Express | 5.47090 | 100.24529 | Batu Feringgi | MY | 1 | <NA> |
1477 | <NA> | <NA> | NaT | <NA> | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | <NA> |
158 | 22440 | 6813 | 2019-07-12 11:07:25 | 1849 | SEK | True | 09/20 | American Express | 26.26490 | 81.54855 | Jais | IN | 1 | <NA> |
808 | 8096 | 8096 | 2019-06-11 21:33:36 | 41358 | MOP | True | 04/29 | VISA 13 digit | 59.37722 | 28.19028 | Narva | EE | 1 | <NA> |
336 | 33270 | 1529 | 2019-03-23 21:44:00 | 32594 | CUC | False | 04/22 | Mastercard | 51.39323 | 0.47713 | Strood | GB | 1 | <NA> |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
339 | 8484 | 5358 | 2019-01-10 07:47:28 | 89503 | GMD | False | 11/24 | Maestro | 47.30997 | 8.52462 | Adliswil | CH | 1 | <NA> |
1383 | 17565 | 3929 | 2019-01-15 01:11:02 | 14264 | DKK | True | 06/20 | VISA 13 digit | 50.72043 | 11.34046 | Rudolstadt | DE | 1 | <NA> |
893 | 108 | 44 | 2019-05-17 00:53:39 | 93218 | SLL | True | 12/24 | JCB 16 digit | 15.72892 | 120.57224 | Burgos | PH | 1 | <NA> |
385 | 29983 | 152 | 2019-06-09 06:50:29 | 41105 | RWF | False | 07/20 | JCB 16 digit | -6.80000 | 39.25000 | Magomeni | TZ | 1 | <NA> |
1074 | 26197 | 4927 | 2019-05-22 15:57:27 | 50481 | MNT | False | 05/26 | JCB 15 digit | 41.00510 | -73.78458 | Scarsdale | US | 1 | <NA> |
1200 行 × 14 列
如果我们对此数据调用 AutoMLSearch.search()
,由于我们添加的列和问题,搜索将失败。注意:这里我们使用 try/except 来捕获 `AutoMLSearch` 引发的 ValueError。
[5]:
automl = AutoMLSearch(X_train=X_train, y_train=y_train, problem_type="binary")
try:
automl.search()
except ValueError as e:
# to make the error message more distinct
print("=" * 80, "\n")
print("Search errored out! Message received is: {}".format(e))
print("=" * 80, "\n")
================================================================================
Search errored out! Message received is: Input y contains NaN.
================================================================================
我们可以使用 EvalML 提供的 search_iterative()
函数来确定我们的数据有哪些潜在的“健康”问题。我们可以看到,这个 search_iterative 函数是 EvalML 通过 evalml.automl
提供的公共方法,并且与 EvalML 中 AutoMLSearch 类的 search
函数不同。这个 search_iterative()
函数允许我们对数据运行默认数据检查,如果没有错误,则会自动运行 AutoMLSearch.search()
。
[6]:
from evalml.automl import search_iterative
automl, messages = search_iterative(X_train, y_train, problem_type="binary")
automl, messages
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/statistics_utils/_calculate_dependence_measure.py:60: SparseDataWarning: One or more pairs of columns did not share enough rows of non-null data to measure the relationship. The measurement for these columns will be NaN. Use 'extra_stats=True' to get the shared rows for each pair of columns.
warnings.warn(
[6]:
(None,
[{'message': '1 out of 1200 rows are 95.0% or more null',
'data_check_name': 'NullDataCheck',
'level': 'warning',
'details': {'columns': None,
'rows': [1477],
'pct_null_cols': id
1477 1.0
dtype: float64},
'code': 'HIGHLY_NULL_ROWS',
'action_options': [{'code': 'DROP_ROWS',
'data_check_name': 'NullDataCheck',
'metadata': {'columns': None, 'rows': [1477]},
'parameters': {}}]},
{'message': "Column(s) 'mostly_nulls' are 95.0% or more null",
'data_check_name': 'NullDataCheck',
'level': 'warning',
'details': {'columns': ['mostly_nulls'],
'rows': None,
'pct_null_rows': {'mostly_nulls': 0.9966666666666667}},
'code': 'HIGHLY_NULL_COLS',
'action_options': [{'code': 'DROP_COL',
'data_check_name': 'NullDataCheck',
'metadata': {'columns': ['mostly_nulls'], 'rows': None},
'parameters': {}}]},
{'message': '1 row(s) (0.08333333333333334%) of target values are null',
'data_check_name': 'InvalidTargetDataCheck',
'level': 'error',
'details': {'columns': None,
'rows': [990],
'num_null_rows': 1,
'pct_null_rows': 0.08333333333333334},
'code': 'TARGET_HAS_NULL',
'action_options': [{'code': 'DROP_ROWS',
'data_check_name': 'InvalidTargetDataCheck',
'metadata': {'columns': None, 'rows': [990], 'is_target': True},
'parameters': {}}]},
{'message': "'no_variance' has 1 unique value.",
'data_check_name': 'NoVarianceDataCheck',
'level': 'warning',
'details': {'columns': ['no_variance'], 'rows': None},
'code': 'NO_VARIANCE',
'action_options': [{'code': 'DROP_COL',
'data_check_name': 'NoVarianceDataCheck',
'metadata': {'columns': ['no_variance'], 'rows': None},
'parameters': {}}]}])
上述 search_iterative
函数的返回值是一个元组。第一个元素是运行时的 AutoMLSearch
对象(否则为 None
),第二个元素是一个字典,其中包含默认数据检查在传入的 X
和 y
数据中发现的潜在警告和错误。在这个字典中,警告是数据检查提供的建议,处理这些建议有助于改善搜索结果,但不会中断 AutoMLSearch
。另一方面,错误表示会中断 AutoMLSearch
并且需要用户解决的问题。
上面,我们可以看到存在错误,因此搜索没有自动运行。
解决警告和错误#
我们可以通过使用 make_pipeline_from_data_check_output
自动解决 search_iterative
返回的警告和错误,这是一个实用方法,它创建一个可以自动清理我们数据的流水线。我们只需将运行 DataCheck.validate()
生成的消息和我们的问题类型传递给此方法即可。
[7]:
from evalml.pipelines.utils import make_pipeline_from_data_check_output
actions_pipeline = make_pipeline_from_data_check_output("binary", messages)
actions_pipeline.fit(X_train, y_train)
X_train_cleaned, y_train_cleaned = actions_pipeline.transform(X_train, y_train)
print(
"The new length of X_train is {} and y_train is {}".format(
len(X_train_cleaned), len(X_train_cleaned)
)
)
The new length of X_train is 1198 and y_train is 1198
现在,我们可以完整地运行 search_iterative
。
[8]:
results_cleaned = search_iterative(
X_train_cleaned, y_train_cleaned, problem_type="binary"
)
注意,这次我们得到了一个 AutoMLSearch
对象作为元组的第一个元素返回。我们可以根据需要使用和检查此 AutoMLSearch
对象。
[9]:
automl_object = results_cleaned[0]
automl_object.rankings
[9]:
id | 流水线名称 | 搜索顺序 | 排名分数 | 平均交叉验证分数 | 交叉验证分数标准差 | 优于基线的百分比 | 高方差交叉验证 | 参数 | |
---|---|---|---|---|---|---|---|---|---|
0 | 1 | 带有 Label Encoder + Da... 的随机森林分类器 | 1 | 0.240358 | 0.240358 | 0.010962 | 95.037942 | False | {'Label Encoder': {'positive_label': None}, 'D...'} |
1 | 0 | 众数基线二元分类流水线 | 0 | 4.843912 | 4.843912 | 0.049015 | 0.000000 | False | {'Label Encoder': {'positive_label': None}, 'B...'} |
如果我们检查元组中的第二个元素,可以看到不再检测到任何警告或错误!
[10]:
data_check_results = results_cleaned[1]
data_check_results
[10]:
[]
仅解决数据检查错误#
之前,我们使用 make_pipeline_from_actions
来解决 search_iterative
返回的所有警告和错误。现在我们将展示如何手动解决错误以允许 `AutoMLSearch` 运行,以及忽略警告可能会以性能为代价。
我们可以先打印出错误以便于阅读,然后我们将从原始训练数据中创建新的特征和目标。
[11]:
errors = [message for message in messages if message["level"] == "error"]
errors
[11]:
[{'message': '1 row(s) (0.08333333333333334%) of target values are null',
'data_check_name': 'InvalidTargetDataCheck',
'level': 'error',
'details': {'columns': None,
'rows': [990],
'num_null_rows': 1,
'pct_null_rows': 0.08333333333333334},
'code': 'TARGET_HAS_NULL',
'action_options': [{'code': 'DROP_ROWS',
'data_check_name': 'InvalidTargetDataCheck',
'metadata': {'columns': None, 'rows': [990], 'is_target': True},
'parameters': {}}]}]
[12]:
# copy the DataTables to new variables
X_train_no_errors = X_train.copy()
y_train_no_errors = y_train.copy()
# We address the errors by looking at the resulting dictionary errors listed
# let's address the `TARGET_HAS_NULL` error
y_train_no_errors.fillna(False, inplace=True)
# let's reinitialize the Woodwork DataTable
X_train_no_errors.ww.init()
X_train_no_errors.head()
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/type_sys/utils.py:33: UserWarning: Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
pd.to_datetime(
[12]:
card_id | store_id | datetime | amount | currency | customer_present | expiration_date | provider | lat | lng | region | country | no_variance | mostly_nulls | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||||
872 | 15492 | 2868 | 2019-08-03 02:50:04 | 80719 | HNL | True | 08/27 | American Express | 5.47090 | 100.24529 | Batu Feringgi | MY | 1 | <NA> |
1477 | <NA> | <NA> | NaT | <NA> | NaN | <NA> | NaN | NaN | NaN | NaN | NaN | NaN | <NA> | <NA> |
158 | 22440 | 6813 | 2019-07-12 11:07:25 | 1849 | SEK | True | 09/20 | American Express | 26.26490 | 81.54855 | Jais | IN | 1 | <NA> |
808 | 8096 | 8096 | 2019-06-11 21:33:36 | 41358 | MOP | True | 04/29 | VISA 13 digit | 59.37722 | 28.19028 | Narva | EE | 1 | <NA> |
336 | 33270 | 1529 | 2019-03-23 21:44:00 | 32594 | CUC | False | 04/22 | Mastercard | 51.39323 | 0.47713 | Strood | GB | 1 | <NA> |
现在我们可以对 X_train_no_errors
和 y_train_no_errors
运行搜索。注意,这里的搜索不会失败,因为我们已经解决了错误,但返回的元组中仍然存在警告。这次搜索允许 mostly_nulls
列在搜索期间保留在特征中。
[13]:
results_no_errors = search_iterative(
X_train_no_errors, y_train_no_errors, problem_type="binary"
)
results_no_errors
/home/docs/checkouts/readthedocs.org/user_builds/feature-labs-inc-evalml/envs/stable/lib/python3.9/site-packages/woodwork/statistics_utils/_calculate_dependence_measure.py:60: SparseDataWarning: One or more pairs of columns did not share enough rows of non-null data to measure the relationship. The measurement for these columns will be NaN. Use 'extra_stats=True' to get the shared rows for each pair of columns.
warnings.warn(
[13]:
(<evalml.automl.automl_search.AutoMLSearch at 0x7fa1282e54f0>,
[{'message': '1 out of 1200 rows are 95.0% or more null',
'data_check_name': 'NullDataCheck',
'level': 'warning',
'details': {'columns': None,
'rows': [1477],
'pct_null_cols': id
1477 1.0
dtype: float64},
'code': 'HIGHLY_NULL_ROWS',
'action_options': [{'code': 'DROP_ROWS',
'data_check_name': 'NullDataCheck',
'metadata': {'columns': None, 'rows': [1477]},
'parameters': {}}]},
{'message': "Column(s) 'mostly_nulls' are 95.0% or more null",
'data_check_name': 'NullDataCheck',
'level': 'warning',
'details': {'columns': ['mostly_nulls'],
'rows': None,
'pct_null_rows': {'mostly_nulls': 0.9966666666666667}},
'code': 'HIGHLY_NULL_COLS',
'action_options': [{'code': 'DROP_COL',
'data_check_name': 'NullDataCheck',
'metadata': {'columns': ['mostly_nulls'], 'rows': None},
'parameters': {}}]},
{'message': "'no_variance' has 1 unique value.",
'data_check_name': 'NoVarianceDataCheck',
'level': 'warning',
'details': {'columns': ['no_variance'], 'rows': None},
'code': 'NO_VARIANCE',
'action_options': [{'code': 'DROP_COL',
'data_check_name': 'NoVarianceDataCheck',
'metadata': {'columns': ['no_variance'], 'rows': None},
'parameters': {}}]}])