Introduction
Product stockouts can be a major headache for businesses. Running out of stock can result in lost sales, dissatisfied customers, and a damaged reputation. However, with the help of predictive analytics, businesses can anticipate potential stockouts and take preventative measures to avoid them. In this technical blog, we will explore the key steps involved in predicting product stockouts.
Note: Predicting Stockouts is a non-trivial problem because of the presence of a variety of factors across the manufacturing, supply chain and demand checkpoints that can lead to stockout. However, even by identifying a small number of stockouts businesses have saved millions of dollars in short term and long term revenue.
Data Collection
The first step in predicting product stockouts is to collect relevant data. This includes historical sales data, inventory levels, lead times, and any other relevant data points that may impact stock levels. It is important to collect a sufficient amount of data to ensure that the predictive model has enough information to accurately predict stock levels.
In this example, we will use data from this Kaggle competition. The dataset includes daily sales of products across various stores or a retail chain. For this exercise, we will use a subset of SKUs from the whole dataset. The Dataset structure is illustrated here.
You can access the full jupyter notebook
Data Preprocessing
Once the data has been collected, the next step is to preprocess it. The preprocessing phase involves several essential steps, such as cleaning the data to remove inconsistencies, handling outliers, and transforming the data into a format that can be used by the algorithm. Data Preprocessing plays an important part in the overall performance of the model. A poorly or less-thought feature preprocessing can lead to inaccurate and biased predictions. At Evolve AI Labs, we place great emphasis on implementing various feature preprocessing strategies to determine the optimal preprocessing method for each feature type. This dataset is time-aware meaning each record is a time slice (daily in this instance), therefore, we will create time-aware lagged features which is a technique used in machine learning to incorporate time-dependent patterns in the data. The frequency of prediction is also another important factor that determines the key data processing activities. In this exercise, we will go for monthly predictions. Given that the data is at a daily level, we will aggregate it up to monthly level.
The target to be predicted also needs to be processed. Typically the target is defined as “Stockout in the next Month”. The definition of the target is derived based on the business needs and the leeway needed by the business to successfully intervene and mitigate a stockout event. For this exercise, we will define our target as “If Stockout will happen in the next month at a given store”.
Feature Engineering
After preprocessing the data, the next step is to engineer features. This involves selecting the most relevant variables and creating new features that may help to improve the predictive power of the model. For example, variables such as seasonality, promotions, and weather conditions may be important factors to consider when predicting stock levels.
Predicting stockouts can be done in two approaches,
- Regression based where you forecast the sales and look for sales to exceed on-hand inventory
- Classification based where you predict the probability of future stockout
Classification approaches are typically the first choice because the target is less noisy and it is easier for business users to consume.
Once the target is decided by the approach, we will need to engineer time-based and window-based features which will indicate patterns and behaviours over time that can provide a signal to the model for predicting the target.
Engineering and Maintaining Time Series features manually is a complex process. At Evolve AI Labs, we leverage our experience in developing forecasting models to creatively generate hundreds of statistical and domain-specific features. This process is sometimes performed on enterprise tools like AWS Sagemaker, DataRobot and sometimes on open source tools like Feature Tools and TSFresh to manage these data pipelines for training and inference.
Modeling
The next step is to select an appropriate predictive model. There are a wide variety of models to choose from, including linear regression, decision trees, and neural networks. The choice of model will mostly depend on the mode of consumption of the model such as inference time, downstream integration, deployment platform, prediction insights etc. But before we train our model, we have to split the data into training and validation to get a realistic opinion of the model and mitigate overfitting. As we are dealing with timeseries data, instead of cross-validation, we will use out-of-time validation data to make sure the validation set has no overlaps and it is not used for any optimisation purposes.
Once the data partitioning is done, the next step is to train it using the preprocessed data. This involves feeding the data into the algorithm and adjusting its parameters to optimize its performance on the out-of-time validation data. This allows us to evaluate the model in the right way because the use case is very sensitive to time periods and features derived accordingly. This will also allow us to understand data or target leakages and seasonality in the dataset.
After the model has been trained, the next step is to evaluate its performance. This involves testing the model on a separate test set to determine its accuracy and to identify any areas where it may be making errors. The performance of the model can be measured using a variety of classification metrics. It is best to use the F1 score, as recall and precision allow us to understand how many stockouts were identified by our model. Accuracy can be a misleading metric in these kinds of use cases. Our model has an f1 score > 0.5 and an accuracy of 75%. This can be improved by adding more calendar features to the datasets.
Once the model has been evaluated and found to be reliably accurate, the final step is to deploy it in a production environment. This involves integrating the model into the existing inventory management system and using it to generate real-time predictions of stock levels. The model can also be used to generate alerts when stock levels are predicted to fall below a certain threshold, allowing businesses to take proactive measures to prevent stockouts.
Conclusion
In conclusion, predicting product stockouts can be a complex task, but with the right approach and tools, it is possible to accurately forecast stock levels and avoid the negative consequences of running out of stock. By collecting and preprocessing relevant data, and engineering features, selecting an appropriate model, training and evaluating the model, and deploying it in a production environment, businesses can take proactive steps to ensure that they always have the right products in stock when their customers need them.
At Evolve AI Labs, our team of highly skilled applied data scientists have extensive experience in solving small-to-large-scale forecasting problems for a variety of industries. We understand that accurate inventory forecasting is crucial for maintaining efficient operations and delivering exceptional customer experiences. That’s why we offer customized and comprehensive predictive modelling solutions that are tailored to meet the unique needs of your business.
We invite you to reach out to us to discuss your specific use case and learn how we can help you unlock the full potential of your data.