optuna调参神器学习笔记(二)
手动添加先验参数
optuna提供自动参数搜索,但有时候有一些特定的超参数集要先尝试, 比如初始学习率和叶子数量. 另外, 也有可能在让 Optuna 找到更好的超参数集之前,你已经尝试过一些集合.
Optuna 提供 两个API 以应对这种场景:
将这些超参数集合传递过去并让 Optuna 对其求值 enqueue_trial()
将这些集合的结果标记为已完成的 Trials add_trial()
第一个场景: 让 Optuna 对你的超参数求值
有一些备选值使用
Optuna 有一个 API optuna.study.Study.enqueue_trial()
, 它允许你将这些超参数传入 Optuna, Optuna 会对他们求值.
def objective(trial): data, target = sklearn.datasets.load_breast_cancer(return_X_y=True) train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25) dtrain = lgb.Dataset(train_x, label=train_y) dvalid = lgb.Dataset(valid_x, label=valid_y) param = { "objective": "binary", "metric": "auc", "verbosity": -1, "boosting_type": "gbdt", "bagging_fraction": min(trial.suggest_float("bagging_fraction", 0.4, 1.0 + 1e-12), 1), "bagging_freq": trial.suggest_int("bagging_freq", 0, 7), "min_child_samples": trial.suggest_int("min_child_samples", 5, 100), } # Add a callback for pruning. pruning_callback = optuna.integration.LightGBMPruningCallback(trial, "auc") gbm = lgb.train( param, dtrain, valid_sets=[dvalid], verbose_eval=False, callbacks=[pruning_callback] ) preds = gbm.predict(valid_x) pred_labels = np.rint(preds) accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels) return accuracy study = optuna.create_study(direction="maximize", pruner=optuna.pruners.MedianPruner())
添加备选值
study.enqueue_trial( { "bagging_fraction": 1.0, "bagging_freq": 0, "min_child_samples": 20, })study.enqueue_trial( { "bagging_fraction": 0.75, "bagging_freq": 5, "min_child_samples": 20, })import loggingimport sys# Add stream handler of stdout to show the messages to see Optuna works expectedly.optuna.logging.get_logger("optuna").addHandler(logging.StreamHandler(sys.stdout))study.optimize(objective, n_trials=100, timeout=600)
第二个场景: 让 Optuna 利用已经求值过的超参数
已经试验过的效果不好的参数,Optuna 有一个 API optuna.study.Study.add_trial()
, 它让你向Optuna 注册这些结果, 之后 Optuna 会在进行超参数采样的时候将它们考虑进去.
study = optuna.create_study(direction="maximize", pruner=optuna.pruners.MedianPruner())study.add_trial( optuna.trial.create_trial( params={ "bagging_fraction": 1.0, "bagging_freq": 0, "min_child_samples": 20, }, distributions={ "bagging_fraction": optuna.distributions.UniformDistribution(0.4, 1.0 + 1e-12), "bagging_freq": optuna.distributions.IntUniformDistribution(0, 7), "min_child_samples": optuna.distributions.IntUniformDistribution(5, 100), }, value=0.94, ))study.add_trial( optuna.trial.create_trial( params={ "bagging_fraction": 0.75, "bagging_freq": 5, "min_child_samples": 20, }, distributions={ "bagging_fraction": optuna.distributions.UniformDistribution(0.4, 1.0 + 1e-12), "bagging_freq": optuna.distributions.IntUniformDistribution(0, 7), "min_child_samples": optuna.distributions.IntUniformDistribution(5, 100), }, value=0.95, ))study.optimize(objective, n_trials=100, timeout=600)
最佳参数重新使用
你已经用 Optuna 发现了好的超参数, 并且想用已经发现的最佳超参数来运行一个类似的 objective 函数以进一步分析结果, 或者为节省时间, 你已经用 Optuna 来优化了一个部分数据集. 在超参数调整以后, 你想在整个数据集上用你找到的最佳超参数来训练模型.
from sklearn import metricsfrom sklearn.datasets import make_classificationfrom sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import train_test_splitimport optunadef objective(trial): X, y = make_classification(n_features=10, random_state=1) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) C = trial.suggest_loguniform("C", 1e-7, 10.0) clf = LogisticRegression(C=C) clf.fit(X_train, y_train) return clf.score(X_test, y_test)study = optuna.create_study(direction="maximize")study.optimize(objective, n_trials=10)print(study.best_trial.value) # Show the best value.
假设在超参数优化之后, 你想计算出同一个数据集上的其他的测度, 比如召回率, 精确度和 f1-score. 你可以定义另一个目标函数, 令其和 objective 高度相似, 以用最佳超参数来重现模型.
def detailed_objective(trial): # Use same code objective to reproduce the best model X, y = make_classification(n_features=10, random_state=1) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1) C = trial.suggest_loguniform("C", 1e-7, 10.0) clf = LogisticRegression(C=C) clf.fit(X_train, y_train) # calculate more evaluation metrics pred = clf.predict(X_test) acc = metrics.accuracy_score(pred, y_test) recall = metrics.recall_score(pred, y_test) precision = metrics.precision_score(pred, y_test) f1 = metrics.f1_score(pred, y_test) return acc, f1, recall, precision# 将最佳参数best_trial传给detailed_object做参数detailed_objective(study.best_trial) # calculate acc, f1, recall, and precision