随机森林回归进行调参

随机森林回归进行调参


文章目录

  • 随机森林回归进行调参
  • 一、调参方法
  • 二、调参过程
  • 1、 调n_estimators参数 :
  • 2、 探索max_depth的最佳参数
  • 3、min_samples_split的最佳参数
  • 4、min_samples_leaf最佳参数
  • 5、max_features调参
  • 6、小范围修改
  • 7、在得到的最优参数附近进行小范围网格搜索
  • 8、随机森林回归预测


  • 提示:以下是本篇文章正文内容,下面案例可供参考

    一、调参方法

    调参过程首先进行依次寻找n_estimators、max_depth、min_samples_split、min_samples_leaf和max_features的最佳参数,然后在最优参数附近进行小范围网格搜索,最终得到最终参数。

    二、调参过程

    导入库并读取数据(ps:加载了很多没用的库-v-)

    	from turtle import shape  
    	import pandas as pd  
    	import  pprint  
    	import numpy as np  
    	import scipy.stats as stats  
    	from  sklearn import random_projection  
    	from sklearn.ensemble import RandomForestRegressor  
    	from sklearn.model_selection import RandomizedSearchCV  
    	from sklearn.model_selection import GridSearchCV  
    	import matplotlib.pyplot as plt  
    	from sklearn import metrics  
    	from sklearn.model_selection import train_test_split  
    	from openpyxl import load_workbook  
    	from sklearn.model_selection import train_test_split,GridSearchCV,cross_val_score  
    	from sklearn.metrics import accuracy_score  
    	#导入数据    
    	X_train=pd.read_excel("D:\科研\RandomForest\新建文件夹\X_train.xlsx")    
    	X_test=pd.read_excel("D:\科研\RandomForest\新建文件夹\X_test.xlsx")    
    	Y_train=pd.read_excel("D:\科研\RandomForest\新建文件夹\Y_train.xlsx")    
    	Y_test=pd.read_excel("D:\科研\RandomForest\新建文件夹\Y_test.xlsx")    
    	    
    	Y_train = Y_train.values.ravel()    
    	# X_train = X_train.values.ravel()    
        
    	print(X_test.shape)    
    	print(Y_test.shape)    
    	print(Y_train.shape)    
    	print(X_train.shape)    
    

    1、 调n_estimators参数 :

    	
    	###调n_estimators参数  
    	ScoreAll = []  
    	for i in range(10,200,1):   #criterion = 'entropy'  
    	    DT = RandomForestRegressor(n_estimators = i,random_state = 66)
    	    score = cross_val_score(DT,X_train,Y_train,cv=10).mean()  
    	    ScoreAll.append([i,score])  
    	ScoreAll = np.array(ScoreAll)  
    	  
    	max_score = np.where(ScoreAll==np.max(ScoreAll[:,1]))[0][0] ##这句话看似很长的,其实就是找出最高得分对应的索引  
    	print("最优参数以及最高得分:",ScoreAll[max_score])    
    	plt.figure(figsize=[20,5])  
    	plt.plot(ScoreAll[:,0],ScoreAll[:,1])  
    	plt.show()  
    
    


    可以看到,103为得分最高点,暂定n_estimators为103,接着调下边的参数。

    2、 探索max_depth的最佳参数

    	ScoreAll = []  
    	for i in range(10,30,3):  
    	    DT = RandomForestRegressor(n_estimators = 103,random_state = 66,max_depth =i ) #,criterion = 'entropy'  
    	    score = cross_val_score(DT,X_train,Y_train,cv=10).mean()  
    	    ScoreAll.append([i,score])  
    	ScoreAll = np.array(ScoreAll)  
    	  
    	max_score = np.where(ScoreAll==np.max(ScoreAll[:,1]))[0][0] 
    	print("最优参数以及最高得分:",ScoreAll[max_score])    
    	plt.figure(figsize=[20,5])  
    	plt.plot(ScoreAll[:,0],ScoreAll[:,1])  
    	plt.show()  
    


    根据曲线,暂定树深为14。

    3、min_samples_split的最佳参数

    	ScoreAll = []  
    	for i in range(2,10,1):  
    	    DT = RandomForestRegressor(n_estimators = 103,random_state = 66,max_depth =14,min_samples_split = i ) #,criterion = 'entropy'  
    	    score = cross_val_score(DT,X_train,Y_train,cv=10).mean()  
    	    ScoreAll.append([i,score])  
    	ScoreAll = np.array(ScoreAll)  
    
    
    


    从2后模型得分快速下降,后模型得分升高,说明出现了过拟合的情况。故最小样本数选择2

    4、min_samples_leaf最佳参数

    调参代码:

    	ScoreAll = []  
    	for i in range(2,10,1):  
    	    DT = RandomForestRegressor(n_estimators = 103,random_state = 66,max_depth =14,min_samples_leaf = i,min_samples_split = 2,) #,criterion = 'entropy'  
    	    score = cross_val_score(DT,X_train,Y_train,cv=10).mean()  
    	    ScoreAll.append([i,score])  
    	ScoreAll = np.array(ScoreAll)  
    


    根据曲线显示,最小叶子数选为7

    5、max_features调参

    	param_grid = {  
    	    'max_features':np.arange(0.1, 1)}  
    	rfc = RandomForestRegressor(random_state=66,n_estimators = 103,max_depth = 14,min_samples_leaf =7 ,min_samples_split =2 )  
    	GS = GridSearchCV(rfc,param_grid,cv=10)  
    	GS.fit(X_train,Y_train)  
    	print(GS.best_params_)  
    	print(GS.best_score_)  
    
    
    

    6、小范围修改

    由于 min_samples_leaf 和min_samples_split会相互影像,故把这三个参数一起网格搜索:

    	param_grid = {  
    	    'max_features':np.arange(0.1, 1),  
    	    'min_samples_leaf':np.arange(5,15),  
    	    'min_samples_split':np.arange(2,10),  
    	    }  
    	rfc = RandomForestRegressor(random_state=66,n_estimators = 103,max_depth = 14 )  
    	GS = GridSearchCV(rfc,param_grid,cv=10)  
    	GS.fit(X_train,Y_train)  
    	print(GS.best_params_)  
    	print(GS.best_score_)  
    
    

    结果:

    {'max_features': 0.1, 
    'min_samples_leaf': 5,
     'min_samples_split': 2}
    0.5697799544764119
    

    参数值:

    n_estimators 103
    max_depth 14
    max_features 0.1
    min_samples_leaf 5
    min_samples_split 2

    7、在得到的最优参数附近进行小范围网格搜索

    	import time  
    	start = time.time()  
    	param_grid = {  
    	  'n_estimators':np.arange(100, 110),  
    	  'max_depth':np.arange(12, 16),  
    	  'min_samples_leaf':np.arange(1, 8),  
    	  'min_samples_split':np.arange(2, 5),  
    	   'max_features':np.arange(0.1, 1)  
    }  
    	  
    	rfc = RandomForestRegressor(random_state=66)  
    	GS = GridSearchCV(rfc,param_grid,cv=10)  
    	GS.fit(X_train,Y_train)  
    	end = time.time()  
    	print("循环运行时间:%.2f秒"%(end-start))  
    	print(GS.best_params_)  
    	print(GS.best_score_)   
    
    
    

    结果:

    参数 旧值 新值
    n_estimators 103 106
    max_depth 14 14
    max_features 0.1 0.1
    min_samples_leaf 5 1
    min_samples_split 2 3

    8、随机森林回归预测

    def test_RandomForestRegressor(X_train,X_test,Y_train,Y_test):  
    	    regr=RandomForestRegressor(n_estimators=103,max_depth=14,min_samples_split=3,min_samples_leaf=1,max_features=0.1)  
    	    regr.fit(X_train,Y_train)  
    	    predictions = regr.predict(X_test)  
    	    data = pd.DataFrame(predictions)  
    	    print("树木个数:103,max_depth=14,split=2,min_samples_split=3,min_samples_leaf=1,max_feature=0.1","Traing Score:%f"%regr.score(X_train,Y_train),"Testing Score:%f"%regr.score(X_test,Y_test))  
    	    predictions=regr.predict(X_test)  
    	    # plt.plot(Y_test['date'], predictions['predictionspredictions'], 'ro', label = 'prediction')  
    	    # print(predictions)  
    	    writer = pd.ExcelWriter("D:\科研\RandomForest\新建文件夹\predict06.xlsx")      # 写入Excel文件  
    	    data.to_excel(writer,sheet_name='num',na_rep='nana',index=False)        # ‘page_1’是写入excel的sheet名  
    	    writer.save()  
    	    writer.close()  
    	  
    	test_RandomForestRegressor(X_train,X_test,Y_train,Y_test)  
    

    感谢:本文调参参考了疯狂学GIS的博文,从中学习颇多。https://blog.csdn.net/zhebushibiaoshifu/category_10827929.html?spm=1001.2014.3001.5515

    物联沃分享整理
    物联沃-IOTWORD物联网 » 随机森林回归进行调参

    发表评论