美国男子职业篮球比赛数据分析与展示系统的设计与实现(Python)

目 录
摘 要 I
Abstract II
1 绪论 1
1.1 项目的背景和意义 1
1.2 现状分析 1
1.2.1 国内研究现状 1
1.2.2 国外研究现状 2
1.2.3 网络爬虫研究现状 2
1.3 研究主要内容 3
1.4 论文结构简介 4
2 技术与原理 5
2.1 技术选型 5
2.1.1 Python语言介绍 5
2.1.2 Scrapy框架介绍 5
2.1.3 Django框架介绍 5
2.1.4 Mysql数据库介绍 6
2.1.5 A JAX介绍 6
2.2 相关原理介绍 7
2.2.1 网络爬虫介绍 7
2.2.2 数据分析技术 7
2.2.3 评分标准以及公式 8
3 需求分析 11
3. 1 业务需求 11
3.2 业务流程 11
3.2.1 NBA比赛数据展示 11
3.2.2 NBA比赛数据分析 12
3.3 角色分析 12
3.4用例分析 13
4 系统设计 14
4.1系统架构及原理 14
4.2 系统模块设计 16
4.2.1 信息爬取模块设计 16
4.2.2 数据分析与展示模块设计 18
4.2.3 登录与注册模块设计 18
4.3 数据库设计 18
5 系统实现 24
5.1 系统框架实现 24
5.2 爬虫采集模块实现 25
5.3 防反爬虫模块实现 26
5.4 数据分析模块 27
5.5 登录与注册模块实现 27
5.5.1 注册界面 27
5.5.2登录界面 27
5.6数据展示模块 28
5.6.1 球员数据展示 28
5.6.2 球队数据展示 28
5.6.3 比赛数据展示 29
5.6.4 比赛预测界面 29
6 系统测试 31
6.1 部署机器概述 31
6.2 配置环境 31
6.3 系统运行 32
6.4 测试用例 32
7 总结与展望 34
7.1 总结 34
7.2 展望 34
参考文献 35
致 谢 36
1.3 研究主要内容
通过爬虫对http://www.stat-nba.com/网站的爬取,获取美国男子职业篮球比赛数据(下简称NBA),美国男子职业篮球比赛数据包括:比赛信息、球员信息、教练信息。比赛信息包括:比赛球队名称、比赛日期、比赛比分、每支球队每节得分、每支球队每个球员的表现(本场得分、投篮命中率、篮板数、抢断数)、每支球队是否为主客场、每支球队本场的主教练。球员信息包括:球员中文译名、球员英文名、球员出生日期、球员身高、球员体重、球员出生地区、球员生涯表现(生涯得分、生涯篮板、生涯抢断、生涯出场数、生涯首发数)、球员生涯荣誉。爬取以json格式保存,最后统一保存到数据库。
由机器学习和数据挖掘,通过对球员数据进行分析,得出球员的三维图(包括球员进攻得分、球员防守得分、球员组织得分),通过三维图,得出球员的总得分。由教练得分、球员得分、球员是否首发、球队历史战绩,通过线性回归算法预测两支球队的胜率分别为多少。
数据展示本系统采用Django框架,前端通过Html+JavaScript实现数据的传输以及展示。后台由Django实现数据的传输处理。数据库将会采用MYSQL或者HBASE。
本系统将会有两个角色实现:用户以及管理员,用户可以查询:球员数据、球员评分、球员三维图、教练数据、教练三维图、教练评分、球队历史数据、球队历史评分等等。管理员负责对用户和数据进行管理,对整体网站进行维护。
本系统是通过对美国男子职业篮球比赛历史数据得采集,采用机器学习和数据挖掘对数据进行分析,通过其分析数据采用机器学习线性回归算法对数据进行合理的分析和预测。得到的最终结果将通过Django+Html+JavaScript的方式到前端展示。
1.4 论文结构简介
本论文的结构安排如下:
第一章,绪论。主要介绍了论文选题项目的背景、意义和目的;以及对相关领域中已有的研究成果和国内外 研究现状的简要评述;介绍本系统涉及的范围和预期结果等。
第二章,技术与原理。主要介绍本系统中所用到的主要技术和理论。
第三章,系统需求分析。使用用例析取和用例规约等系统分析方法对本系统进行了需求分析。
第四章,系统设计。介绍了系统的架构与原理,讲述本系统的各大模块的设计以及数据库 的设计详情。
第五章,系统实现。介绍本系统具体的实现过程以及实现的效果。
第六章,系统部署。介绍本系统的部署环境与部署方法。
第七章,总结与展望。对本系统所做的工作进行总结,提出了需要讨论的问题和一些本系统中可以改进的地方。
3 需求分析
3. 1 业务需求
本系统用于NBA比赛数据分析展示,基于网络爬虫的NBA比赛数据采集与分析展示系统要实现NBA比赛数据数据抓取,数据过滤,数据筛选,数据展示,NBA比赛数据分析等服务和功能.
3.2 业务流程
3.2.1 NBA比赛数据展示
比赛数据展示业务流程如图3-1所示:

图3-1 流程图
3.2.2 NBA比赛数据分析
比赛数据分析业务流程如图3-2所示:

图3-2 流程图
3.3 角色分析
本系统主要用于以下几类人员:
数据管理员,完成数据的抓取,过滤与筛选,NBA比赛数据的分析,以及本系统管理维护等。
用户,在网页上进行NBA比赛数据查看,通过系统查看NBA比赛数据的分析,点击进入对应NBA比赛数据展示页面等。
3.4用例分析
查看NBA数据:本用例允许用户增加或者删除自己需要展示的NBA比赛数据的关键字,以及对已经展示的关键字进行确认等操作。基本事件流:用例开始于用户进入NBA比赛数据展示页面进行操作。用户进入界面输入查询关键字,经后台匹配确认返回给前端对应的数据。无特殊要求。前置条件:本用例开始前用户必须是系统已登录状态。后置条件:如果用例成功,用户的展示页面将被更新。
数据分析:本用例允许数据管理员根据NBA比赛爬取的数据进行数据处理。基本事件流:用例开始于爬虫系统采集到NBA比赛数据时。系统将NBA比赛数据内容根据算法与NBA比赛数据关键字作对比,对比结果分别为“匹配”、“不匹配”。如果匹配状态为“匹配”,系统将调用系统分析接口分析该NBA比赛数据,本用例结束。如果匹配状态为“不匹配”,将不会对NBA比赛数据的分析,本用例结束。无特殊要求。前置条件:本用例开始前采集到的NBA比赛数据必须有效。后置条件:如果用例成功,数据管理员将得到一条NBA比赛数据分析。

from preprocessing import *
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier,XGBRegressor

import pickle

def preprocess(csv_url = "D:/NBA-Predict/matches_w_player_stats.csv"):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.rslt)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_home_team_point(csv_url ="D:/NBA-Predict/matches_w_player_stats.csv"):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.teamPTS)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_away_team_point(csv_url ="D:/NBA-Predict/matches_w_player_stats.csv"):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.opptPTS)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_home_q1_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.teamPTS1)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_home_q2_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.teamPTS2)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_home_q3_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.teamPTS3)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_home_q4_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.teamPTS4)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_away_q1_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.opptPTS1)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_away_q2_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.opptPTS2)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_away_q3_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.opptPTS3)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def preprocess_away_q4_point(csv_url):
    df = pd.read_csv(csv_url)  # read data
    df.columns = remove_whitespaces_in_df_columns(df)  # clean column names
    df.columns = get_column_names()  # get column names and assign
    df = old_to_new_team_abbrs(df)
    # players = get_unique_player_list(df)  # get all player list
    # team_abbrs = get_unique_team_abbr(df)  # get all team names
    df = drop_some_columns(df)
    df = clean_nan_values(df)
    Y = np.array(df.opptPTS4)
    encoded,dict1,dict2,dict3 = label_train_data(df)
    scalered ,ss= standart_scaler_all_data(encoded)
    df = onehotencoder_all_data(scalered)
    return df,Y
def split_data_by_result(df,result_index=60):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    print(Y)
    return X,Y
def split_data_home_point(df, result_index=62):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_away_point(df, result_index=67):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_home_q1_point(df, result_index=63):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_home_q2_point(df, result_index=64):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_home_q3_point(df, result_index=65):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_home_q4_point(df, result_index=66):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_away_q1_point(df, result_index=68):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_away_q2_point(df, result_index=69):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_away_q3_point(df, result_index=70):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y
def split_data_away_q4_point(df, result_index=71):
    X = df.drop(columns=result_index)
    Y = df[result_index]
    return X, Y

def build_classifier_model(X_train,Y_train,X_test,Y_test):
    # model = XGBClassifier(n_estimators=5000,nthread=4,seed=42,reg_lambda=0.95,reg_alpha=0.45,tree_method="gpu_hist",max_depth=3,objective="binary:logistic")
    model = XGBClassifier(tree_method="gpu_hist",nthread=4,n_estimators=20000)

    model.fit(X_train, Y_train,eval_set=[(X_test,Y_test)],eval_metric="error",early_stopping_rounds=42)
    pickle.dump(model,open("D:/NBA-Predict/model/xgb_classifier.pkl","wb"))
    print(model.score(X_test,Y_test))
    print(model.classes_)

    return model , model.score(X_test,Y_test)

def build_class_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess(csv_url)

    X,_ = split_data_by_result(df=df)
    X_train,X_test,Y_train,Y_test = data_split(X,Y)
    model_ , score__ = build_classifier_model(X_train,Y_train,X_test,Y_test)
    return model_,X_train,X_test,Y_train,Y_test

def build_regressor_model(X_train, Y_train, X_test, Y_test, name):
    # model = XGBRegressor(learning_rate=0.15,gamma=0,reg_lambda=0.01,max_delta_step=0, max_depth=3,n_estimators=10000,
    #                      min_child_weight=1,nthread=4,tree_method="gpu_hist")
    model = XGBRegressor(tree_method="gpu_hist",n_estimators=20000,nthread=4)
    model.fit(X_train, Y_train,eval_metric="rmse",eval_set=[(X_test,Y_test)],early_stopping_rounds=20)
    pickle.dump(model, open(name, "wb"))
    print(model.score(X_test,Y_test))
    return model, model.score(X_test, Y_test)
def build_home_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_home_team_point(csv_url)
    X,_ = split_data_home_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train=X_train,Y_train=Y_train,X_test=X_test,Y_test=Y_test,
                                             name="D:/NBA-Predict/model/teams/home_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_away_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_away_team_point(csv_url)
    X,_ = split_data_away_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train=X_train,Y_train=Y_train,X_test=X_test,Y_test=Y_test,name="D:/NBA-Predict/model/teams/away_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_home_q1_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_home_q1_point(csv_url)
    X,_ = split_data_home_q1_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/home_q1_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_home_q2_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_home_q2_point(csv_url)
    X,_ = split_data_home_q2_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/home_q2_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_home_q3_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_home_q3_point(csv_url)
    X,_ = split_data_home_q3_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/home_q3_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_home_q4_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_home_q4_point(csv_url)
    X,_ = split_data_home_q4_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/home_q4_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_away_q1_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_away_q1_point(csv_url)
    X,_ = split_data_away_q1_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/away_q1_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_away_q2_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_away_q2_point(csv_url)
    X,_ = split_data_away_q2_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/away_q2_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_away_q3_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_away_q3_point(csv_url)
    X,_ = split_data_away_q3_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/away_q3_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def build_away_q4_point_predict_model(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df,Y=preprocess_away_q4_point(csv_url)
    X,_ = split_data_away_q4_point(df=df)
    X_train,X_test,Y_train,Y_test = data_split_regresyon(X,Y)
    model_ , score__ = build_regressor_model(X_train,Y_train,X_test,Y_test,name="D:/NBA-Predict/model/teams/away_q4_point_model.pkl")
    return model_,X_train,X_test,Y_train,Y_test
def get_result(X, model):
    return model.predict_proba(X)
def get_point_result(X, model):
    return model.predict(X)
def get_points(X,model):
    return model.predict(X) ,model.predict_proba(X)
def data_split(X,Y,test_size=0.1):
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=42)
    Y_train = Y_train.reshape(-1, 1)
    Y_test = Y_test.reshape(-1, 1)
    print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
    return X_train,X_test,Y_train,Y_test
def data_split_regresyon(X,Y,test_size=0.1):
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=42)
    Y_train = Y_train.reshape(-1, 1)
    Y_test = Y_test.reshape(-1, 1)
    print(X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
    return X_train,X_test,Y_train,Y_test
def predict_match_result(data):
    model = pickle.load(open("D:/NBA-Predict/model/xgb_classifier.pkl","rb"))
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    print(df.head())
    X, Y = split_data_by_result(df=df,result_index=60)
    return get_result(X, model),int(Y[0])
def predict_home_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/home_point_model.pkl","rb"))
    Y = np.array(data.teamPTS)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_home_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_away_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/away_point_model.pkl","rb"))
    Y = np.array(data.opptPTS)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_away_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_home_q1_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/home_q1_point_model.pkl","rb"))
    Y = np.array(data.teamPTS1)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_home_q1_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_home_q2_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/home_q2_point_model.pkl","rb"))
    Y = np.array(data.teamPTS2)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_home_q2_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_home_q3_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/home_q3_point_model.pkl","rb"))
    Y = np.array(data.teamPTS3)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_home_q3_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_home_q4_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/home_q4_point_model.pkl","rb"))
    Y = np.array(data.teamPTS4)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_home_q4_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_away_q1_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/away_q1_point_model.pkl","rb"))
    Y = np.array(data.opptPTS1)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_away_q1_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_away_q2_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/away_q2_point_model.pkl","rb"))
    Y = np.array(data.opptPTS2)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_away_q2_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_away_q3_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/away_q3_point_model.pkl","rb"))
    Y = np.array(data.opptPTS3)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_away_q3_point(df=df)
    return get_point_result(X, model),int(Y[0])
def predict_away_q4_point(data):
    model = pickle.load(open("D:/NBA-Predict/model/teams/away_q4_point_model.pkl","rb"))
    Y = np.array(data.opptPTS4)
    encoded= label_test_data(data)
    scalered = standart_scaler_test_data(encoded)
    df = onehotencoder_test_data(scalered)
    X, _ = split_data_away_q4_point(df=df)
    return get_point_result(X, model),int(Y[0])
def build_all_in_one(csv_url="D:/NBA-Predict/matches_w_player_stats.csv"):
    df = pd.read_csv("D:/NBA-Predict/test_data/GSWvsHOU_test_data.csv", index_col=0)
    df.drop("index", axis=1, inplace=True)
    df.fillna(0, inplace=True)
    df_2 = df.copy()
    df_3 = df.copy()
    df_4 = df.copy()
    df_5 = df.copy()
    df_6 = df.copy()
    df_7 = df.copy()
    df_8 = df.copy()
    df_9 = df.copy()
    df_10 = df.copy()
    df_11 = df.copy()

    model_, X_train, X_test, Y_train, Y_test = build_class_model(csv_url=csv_url)
    print(predict_match_result(df_11))

    # model_, X_train, X_test, Y_train, Y_test = build_home_q1_point_predict_model(csv_url=csv_url)
    # print(predict_home_q1_point(df_5))
    # model_, X_train, X_test, Y_train, Y_test = build_home_q2_point_predict_model(csv_url=csv_url)
    # print(predict_home_q2_point(df_6))
    # model_, X_train, X_test, Y_train, Y_test = build_home_q3_point_predict_model(csv_url=csv_url)
    # print(predict_home_q3_point(df_7))
    # model_, X_train, X_test, Y_train, Y_test = build_home_q4_point_predict_model(csv_url=csv_url)
    # print(predict_home_q4_point(df_8))
    # model_, X_train, X_test, Y_train, Y_test = build_away_q1_point_predict_model(csv_url=csv_url)
    # print(predict_away_q1_point(df))
    # model_, X_train, X_test, Y_train, Y_test = build_away_q2_point_predict_model(csv_url=csv_url)
    # print(predict_away_q2_point(df_2))
    # model_, X_train, X_test, Y_train, Y_test = build_away_q3_point_predict_model(csv_url=csv_url)
    # print(predict_away_q3_point(df_3))
    # model_, X_train, X_test, Y_train, Y_test = build_away_q4_point_predict_model(csv_url=csv_url)
    # print(predict_away_q4_point(df_4))
    # model_, X_train, X_test, Y_train, Y_test = build_home_point_predict_model(csv_url=csv_url)
    # print(predict_home_point(df_9))
    # model_, X_train, X_test, Y_train, Y_test = build_away_point_predict_model(csv_url=csv_url)
    # print(predict_away_point(df_10))


if __name__ == '__main__':
    build_all_in_one()




(Python)























物联沃分享整理
物联沃-IOTWORD物联网 » 美国男子职业篮球比赛数据分析与展示系统的设计与实现(Python)

发表评论