Predicting American Football Outcomes

Our project leverages historic data from numerous sources in order to feed a predictive model that will determine the outcome of a football game based on the play made in the game.

Built at Hacklytics 2024

Created on 11th February 2024

•

Predicting American Football Outcomes

Our project leverages historic data from numerous sources in order to feed a predictive model that will determine the outcome of a football game based on the play made in the game.

The problem Predicting American Football Outcomes solves

With all the constant debate these days about who will win what football game, we want to provide a solution to the ambiguity. Who is going to win the next game? Was that play a good or bad one? Will your favorite team make it to the SuperBowl? These are just a few of the questions that we sought out to find the answer to.

Desired Outcomes of the Model

Using the chosen play-by-play data from 2018 - 2022 NFL Football regular seasons for training, and the 2023 regular season for testing, the aim of our model is to accurately predict the outcome of a Football game based on the plays that are made by the team.

Use Case for our Model

This model will ideally be used to accurately predict the outcome of a game in real-time. As each play of the game unfolds, the model will identify the team that is going to win the game.

Challenges we ran into

Points of Struggle

Finding the proper datasets that contained all of the data that we were looking to capture for our problem.
- We were able to identify multiple sources for data related to our problem. With this being the case, we decided to review each of the files to see which of them had columns that were relevant to our issue - with that, we were able to find a few sites that contained a great amount data for our use in training and testing our model.
Merging of datasets.
- This one threw us for a loop. We were able to identify the common fields accross the different datasets. The odd thing was that the merge statement in python was working with the same logic for two different files, and was failing for another pair. We were able to identify the problem columns, ensure the proper datatype, and effectively fixed the bug.
Determining how to use PCA for feature selection.
- When it came time to identify the proper data features for training and testing our model, we were working to implement PCA to aid our efforts. Getting this information in a workable state was difficult at first. We persisted and tried things numerous ways, never ceasing until we were able to achieve the result that we were looking for. We found a method that worked in the end, and our project would not be the same without it.
Improving the accuracy of the model.
- Upon initial creation of our first model, we were a little unhappy with the percentages that we were getting back for the confidence score. At first, we were getting that our model was about 56% accurate, which definitely had us question how we could improve the accuracy. We began adjusting the amount of features that we were to include in our model, and in doing so, we were able to create a model that was 64.45% accurate, which was an improvement from the initial iteration.

Tracks Applied (1)

Sports

We are using a machine learning model to predict the outcome of football games.

Technologies used

HTML

Visual Studio Code

NumPy

pandas

GitHub

sklearn

excel

DataSpell

Python Notebooks

Discord (communications)

Discussion

Builders also viewed

See more projects on Devfolio