> 技术文档 > 2025年“创新杯”(原钉钉杯) B题 详细建模思路

2025年“创新杯”(原钉钉杯) B题 详细建模思路


2025年“创新杯”(原钉钉杯) 建模思路

B题 道路路面维护需求综合预测

2025钉钉杯 B题解题思路

任务A:道路维护需求预测(二分类)

1 问题分析

  • 特征多样:数值型(PCI、AADT)+ 分类型(道路类型、沥青类型)。
  • 样本不平衡:需维护路段占少数。
  • 可解释性:需量化关键特征对维护需求的影响。
  • 解决方案:随机森林——支持混合数据、鲁棒、自带特征重要性。

2 Python 代码

import pandas as pdfrom sklearn.model_selection import train_test_splitfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.metrics import accuracy_score, recall_score, f1_score, confusion_matrixfrom sklearn.preprocessing import StandardScaler, OneHotEncoderfrom sklearn.compose import ColumnTransformerimport matplotlib.pyplot as plt# 数据data = pd.read_csv(\'road_maintenance.csv\')X = data[[\'PCI\',\'Road_Type\',\'AADT\',\'Asphalt_Type\', \'Last_Maintenance\',\'Average_Rainfall\',\'Rutting\',\'IRI\']]y = data[\'Needs_Maintenance\']# 预处理pre = ColumnTransformer([ (\'cat\', OneHotEncoder(), [\'Road_Type\',\'Asphalt_Type\']), (\'num\', StandardScaler(), [\'PCI\',\'AADT\',\'Last_Maintenance\', \'Average_Rainfall\',\'Rutting\',\'IRI\']) ])X_proc = pre.fit_transform(X)# 划分X_train, X_test, y_train, y_test = train_test_split( X_proc, y, test_size=0.2, random_state=42)# 建模clf = RandomForestClassifier(n_estimators=100, max_depth=10, min_samples_split=5, random_state=42)clf.fit(X_train, y_train)# 评估y_pred = clf.predict(X_test)print(\"Accuracy:\", accuracy_score(y_test, y_pred))print(\"Recall: \", recall_score(y_test, y_pred))print(\"F1: \", f1_score(y_test, y_pred))# 特征重要性names = pre.get_feature_names_out()imp = clf.feature_importances_plt.barh(names, imp)plt.title(\'Feature Importance\')plt.show()

3 MATLAB 代码

%% 数据data = readtable(\'road_maintenance.csv\');cat_vars = {\'Road_Type\',\'Asphalt_Type\'};for v = cat_vars data.(v{1}) = categorical(data.(v{1}));endX = data(:, {\'PCI\',\'Road_Type\',\'AADT\',\'Asphalt_Type\', ... \'Last_Maintenance\',\'Average_Rainfall\',\'Rutting\',\'IRI\'});y = data.Needs_Maintenance;%% 编码与标准化Xenc = onehotencode(X, cat_vars);Xnorm = normalize(Xenc);%% 划分rng(1)cv = cvpartition(size(Xnorm,1), \'HoldOut\', 0.2);Xtr = Xnorm(cv.training,:); ytr = y(cv.training);Xte = Xnorm(cv.test,:); yte = y(cv.test);%% 模型model = TreeBagger(100, Xtr, ytr, ...  \'Method\',\'classification\', ...  \'MaxDepth\',10, \'MinParentSize\',5);%% 评估y_hat = str2double(predict(model, Xte));cm = confusionmat(yte, y_hat);acc = sum(diag(cm))/sum(cm(:));rec = cm(2,2)/sum(cm(2,:));f1 = 2*rec*cm(2,2)/sum(cm(:,2))/(rec+cm(2,2)/sum(cm(:,2)));fprintf(\'Accuracy: %.4f\\nRecall: %.4f\\nF1: %.4f\\n\', acc, rec, f1);%% 特征重要性imp = model.OOBPermutedPredictorDeltaError;bar(imp)xticklabels(X.Properties.VariableNames)title(\'Feature Importance\')

任务B:维护紧急程度评分与优先级划分

1 思路

  1. 输出连续评分:将任务A的随机森林改为回归模型,输出 [0,1] 区间的紧急程度 R
  2. 无监督聚类:使用 K-means 把 R 划分为高、中、低三个优先级。
  3. 可解释验证:检查高优先级路段的 PCI、IRI 等核心指标,确保策略合理。

2 Python 代码

import numpy as np, pandas as pd, matplotlib.pyplot as pltfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.cluster import KMeansfrom sklearn.preprocessing import MinMaxScaler# 数据data = pd.read_csv(\'road_maintenance.csv\')X = data.drop(\'Needs_Maintenance\', axis=1)y = data[\'Needs_Maintenance\']# 回归随机森林rf = RandomForestRegressor(n_estimators=100, random_state=42)rf.fit(X, y)R = rf.predict(X)# 归一化R_norm = MinMaxScaler().fit_transform(R.reshape(-1,1)).flatten()# K-means 聚类(k=3)km = KMeans(n_clusters=3, random_state=42)clusters = km.fit_predict(R_norm.reshape(-1,1))# 映射优先级centers = km.cluster_centers_.flatten()order = np.argsort(centers)prio_map = {order[0]:0, order[1]:1, order[2]:2}priorities = np.array([prio_map[c] for c in clusters])# 统计print(pd.Series(priorities).value_counts().sort_index())# 可视化plt.hist(R_norm[priorities==0], bins=30, alpha=.7, color=\'green\', label=\'Low\')plt.hist(R_norm[priorities==1], bins=30, alpha=.7, color=\'blue\', label=\'Medium\')plt.hist(R_norm[priorities==2], bins=30, alpha=.7, color=\'red\', label=\'High\')plt.xlabel(\'Maintenance Urgency Score\')plt.ylabel(\'Number of Segments\')plt.title(\'Priority Distribution via K-means\')plt.legend(); plt.show()

3 MATLAB 代码

%% 加载任务A回归模型load(\'rf_regression_model.mat\'); % model 已保存data = readtable(\'road_maintenance.csv\');X = data{:, {\'PCI\',\'Road_Type\',\'AADT\',\'Asphalt_Type\', ... \'Last_Maintenance\',\'Average_Rainfall\',\'Rutting\',\'IRI\'}};%% 预测紧急程度R = predict(model, X);R_norm = (R - min(R)) / (max(R) - min(R));%% K-means 聚类rng(1)[idx, centers] = kmeans(R_norm, 3);[~, order] = sort(centers);prio = zeros(size(idx));prio(idx==order(1)) = 0; % 低prio(idx==order(2)) = 1; % 中prio(idx==order(3)) = 2; % 高%% 计数fprintf(\'High: %d, Medium: %d, Low: %d\\n\', ... sum(prio==2), sum(prio==1), sum(prio==0));%% 可视化figurehistogram(R_norm(prio==0), \'BinWidth\',0.05,\'FaceColor\',\'g\'); hold onhistogram(R_norm(prio==1), \'BinWidth\',0.05,\'FaceColor\',\'b\');histogram(R_norm(prio==2), \'BinWidth\',0.05,\'FaceColor\',\'r\');xlabel(\'Maintenance Urgency Score\'); ylabel(\'Count\');title(\'Priority Distribution\'); legend(\'Low\',\'Medium\',\'High\');

箱包资讯