Created on 27th June 2021
•
Flight ticket prices can be something hard to guess, today we might see a price, check out the price of the same flight tomorrow, it will be a different story. We might have often heard travelers saying that flight ticket prices are so unpredictable. As data scientists, we are gonna prove that given the right data anything can be predicted. Here you will be provided with prices of flight tickets for various airlines between the months of March and June of 2019 and between various cities. Size of training set: 10683 records.
I've gone through the Exploitary Data Analytics and building with some machine elarning algorithms. Based on the accuracy, ensring the best model to be fit.
First, I transformed the categorical variables into dummy variables. I also split the data into train and tests sets with a test size of 30%.
I tried forteen different models and evaluated them using Root Mean Squared Error. I chose RMSE because it is relatively easy to interpret and outliers aren’t particularly bad in for this type of model.
Different models I tried:
LinearRegression : 2779.0455708889144
ElasticNet : 3379.6819876610443
Lasso : 2759.449381312224
Ridge : 2710.8476127741037
KNeighborsRegressor : 3249.005561971264
DecisionTreeRegressor : 2017.530360334335
RandomForestRegressor : 1662.7359733973055
SVR : 4246.460099935076
AdaBoostRegressor : 3135.985374101527
GradientBoostingRegressor : 1904.7364927923986
ExtraTreeRegressor : 2432.1393735590073
HuberRegressor : 3108.870789540331
XGBRegressor : 1603.7426369307445
BayesianRidge : 2773.275561516677
XGBRegressor, RandomForestRegressor and GradientBoostingRegressor gave the lowest RMSE so I chose these model and performed hyper parameter tuning.