Welcome to this blog on. Today’s topic will cover the fundamental difference between forecasting and prediction.
We will also cover theoretical aspects of linear regression, it is more important to learn how it is implemented in the industry.
Introduction
In our previous blog, we covered linear regression in detail with Python code. I highly recommend you go through these links to understand the science behind Linear regression.
Now let’s understand the industry’s relevance of this model. First, recall what we have seen so far.
Theory of Linear Regression
- Linear regression is a process of relationship between independent variables. Linear regression’s focus is to establish a relationship between multiple independent variables also called predictor variables.
- Linear Regression also explains how changes in the dependent variable vary with a unit change in the value of the predictor. In multi-linear regression, you can change multiple variables at a time.
- Linear Regression is used in forecasting and Predicting. both overlap each other, however, it’s different to understand when to use and where to use it.
- Linear regression guarantees interpolation of data, not extrapolation. Now you are thinking what is this term right?
So interpolation is guessing data points that fall within the range of the data i.e. between your existing data points. Extrapolation is guessing data points from beyond the range of the dataset. - Linear regression only shows the relationship which is correlation, not causation (correlation does not imply causation). basically in statistics many statistical tests only calculate the correlation between variables. And when two variables are found to be correlated it is tempting to assume that this shows that one variable cause the other variable.
- Linear regression is a parametric model opposite to a non-parametric one. In the parametric model, a number of parameters are fixed concerning the sample size. In a non-parametric model, the effective number of parameters can grow with the sample size.
Prediction vs Forecasting
Prediction
Forecasting (projection)
- Importance of Outcome: Identify the driver variable and measure the impact on the dependent variable.
For example: A company wants to understand why my sales are declining and they don’t want to forecast the next week’s sales.
So in this case we are identifying the cause of the sales decline. And what variable is responsible for that and they can think about that variable and how we can minimize the variable impact going forward. - Assumption: No specific assumption is considered while building the Prediction model.
- Complexity and Accuracy of the model: A simple model is always better than a complex model. In Prediction, we always try to keep the model simple. Because based on those answers or attributes to be precious business people will take action. And if the model is very complex in nature understanding the right behavior of the attribute would be very difficult. And we will end up with a wrong understanding of the business indicators.
- Importance of Outcome: Projection is focused on the final result/ forecasted value. Because the result we are getting is not dependent on the driver variable. This means any other way we can only get the result irrespective of what variables are predicting.
For example: A loan company wants to check the higher accuracy in terms of providing a loan to the right candidate.
So high accuracy is very important as this loan amount is very high. And it does not bother about which driving factor is important or not. The aim is to create a model with high accuracy. - Assumption: Suppose everything remains the same today. but the forecast will change if a new incident occurs (model drift).
For example: let’s suppose I am forecasting the country’s growth today. the country’s growth depends on the policy which exists today. But Tomorrow if political parties change and they come up with new policies it might affect the forecasting model. - Complexity and Accuracy of the model: Choose accuracy over the explanation. so even if the model is complex and given higher accuracy that is good.
Summary
So when we are making a projection, we have to assumed that the conditions in which the model was built continue to be the same. Forecast assumes that conditions remain the same as they were when the model was built.
The accuracy of the final outcome is more important than the identification of the most important driver variables. While making a projection, the aim is accuracy. Thus, a complex model containing a large number of variables. With high accuracy is more valuable than a simple model with lower accuracy.
Benefits of a Simple Model:
- Interpretability: A simple model is easier to understand, allowing business people to see how changes in driver variables affect sales.
- Actionable Insights: By understanding the cause-and-effect relationships, businesses can focus on mitigating the impact of driver variables they control (e.g., adjusting marketing strategies).
Additional Considerations:
Model Validation: Ensure the model accurately reflects the relationship between variables.
Domain Expertise: Combine data analysis with business knowledge to ensure actionable insights.
Footnotes:
Additional Reading
- AI vs ML vs DL vs Data Science
- Logistic Regression for Machine Learning
- Cost Function in Logistic Regression
- Maximum Likelihood Estimation (MLE) for Machine Learning
OK, that’s it, we are done now. If you have any questions or suggestions, please feel free to comment. I’ll come up with more Machine Learning and Data Engineering topics soon. Please also comment and subs if you like my work any suggestions are welcome and appreciated.