As we kick off 2025, let’s take a deeper dive into our previous discussion about machine learning by talking about the most critical aspect of our work: developing the machine learning pipeline, starting from pure raw data, to predictions and recommendations generated hourly on our system.
If you have missed the first part, where we explain all the differences between AI and machine learning, and how it can add value to your day-to-day hotel management, then please check it here.
If you are also interested in AI’s usage in the industry and exploring its potential in various sectors (improving planning, efficiency, and guest comfort), then check out our other blog post series by Emilia here.
Step 1: Data Collection & Cleaning
Every great machine-learning model begins with one essential ingredient: high-quality data. In the current fast-developing world of AI, data isn’t just fuel, it’s the foundation. Even the most advanced algorithms and machine learning teams can’t perform well if the data they’re fed is incomplete, inconsistent, or irrelevant.
This is why I consider the first step of our pipeline, data collection and cleaning, the most critical part that sets the tone for the entire pipeline. Our preparation begins by gathering a comprehensive set of data that provides us with the most knowledge about a hotel and how it behaves in the market. This includes:
- PMS Information (Occupancy, Revenue, ADR, Cancellations, Arrivals, and even more KPI’s)
- Scraped prices, rates, and availability from several OTAs for the hotel, their competitors, and the market
- Reviews and search trends
- Weather reports
- Events and holidays
From each one of these sources, we extract key insights, known as features, which are then used to power customized and personal machine-learning models for each hotel. After the collection and scraping phase is done, the raw data goes through a specific preparation pipeline before usage:
- Cleansing: Removing any noise or irrelevant information from the data that could be wrongly handled to ensure we are only using data of high quality and consistency.
- Interpolation: Cleansing usually leads to a lot of gaps in the data that need to be filled; for this, we reconstruct the missing information using the already existing data that we have to ensure a smooth transition.
- Feature Engineering: Once we have our data cleaned and ready, we extract new features from already existing ones using revenue management knowledge to add more context to the model.
- Transformation: Once we have all our data ready, we modify it further to convert it into a state/encoding that the machine learning model can easily understand.
- Storage: The final stage of our data preparation, where data is safely stored in a secure place for easy access and future use. It is then later picked up for further analysis and integration into the pipeline.
Step 2: Data Analysis
Once we have collected, cleaned, and prepped the data, the next step is to analyze it. Through exploratory data analysis (EDA), we try to discover and understand patterns about the hotel, that help us formulate a hypothesis about how well the machine learning model will perform for your data.
This involves the creation of visualizations using histograms, scatter plots, and heatmaps that help us look for any trends that could be deemed useful, and detect any new corner cases where our pipeline could fail. In this case, EDA helps us in identifying patterns like seasonal booking behavior, or peak check-in times. By studying competitors' patterns, we can also notice regional preferences for certain seasons.
Using that knowledge we can then identify anomalies and outliers - a sudden spike of high prices or bookings that does not align with historical data for example - this usually acts as a signal that certain time frames should be investigated further. In some cases, it could be time frames that could affect the training process negatively and should be removed, or it could hold a high value of rare trends that could lead to considerable information gain if correctly utilized.
Step 3: Model Training
Upon positive completion of the data analysis and ensuring high data quality, we then move forward with model training, the phase where our algorithms learn to make solid predictions based on the given engineered features.
The first step in model training is splitting the data into three distinct sections: training, validation, and testing. This approach ensures that the model is trained on one dataset and evaluated on entirely separate data to measure its true performance. Machine learning algorithms often seek the easiest way to maximize accuracy, which can sometimes lead to overfitting - memorizing the input data instead of learning patterns that generalize to new, unseen data. By reserving the validation and test datasets, we can assess how well the model generalizes beyond the training data. The validation set helps fine-tune the model during development; the test set - unseen during training - provides a final, unbiased evaluation of the model's ability to perform in real-world scenarios.
Other than manually analyzing the data and extracting a set of features we see as most influential for capturing trends and patterns, we also utilize a machine learning-based feature selection. This is done by training several ML models on different sets of features to try and analyze which are the most influential in prediction. Upon each training run, we calculate the importance metric for each feature, and on the final run, we pick the features with the highest average metric to be included in the final model. This is done to remove any unnecessary unimportant features that have no effect on the hotel. For example, some hotels could be highly affected by weather and certain events, unlike others where demand is not affected by the current weather of the region.
After finishing the training process, we evaluate model performance both quantitatively and qualitatively to ensure that the model training result adheres to our standards and meets the given objectives. Quantitatively, we measure metrics such as accuracy, precision, and mean squared error (MSE) based on the task, to measure the predictive power objectively.
Qualitatively, on another hand, we analyze the model’s predictions in the context of the market, exploring whether the results make sense given domain knowledge about revenue management. This process usually involves internal revenue managers in Hotellistat, utilizing their deep knowledge in the field. This not only ensures that the model is mathematically robust, but also practically useful.
Step 4: Full Integration & Automation
After the successful completion of model training, our main onboarding section of the pipeline is now complete, paving the way for the final step: full integration and automation.
At this stage, the produced model is then marked as ready for deployment. This integrates the new model to be part of the system, enabling it to be used daily and hourly to provide new inference results. Our pricing recommendation models run hourly, ensuring that it is reactive to any change in the system or market, while our demand prediction runs daily to ensure high accuracy and consider as many factors in the prediction as possible.
However, deployment is not the end of the journey. To maintain peak performance, we regularly retrain models for all hotels in the system. For rapidly evolving scenarios, such as new hotels or changing conditions, retraining occurs weekly. In other cases, it happens monthly to capture emerging trends and ensure that the models continue to adapt to shifting market dynamics.
Conclusion
As you can see, utilizing machine learning is a complex process that requires time, expertise, and continuous refinement. The term "machine learning" itself implies that the model must learn and be trained. This process doesn't just depend on time - it also requires that the instructors guiding the model are experts in their field, capable of monitoring the learning process and adapting the "curriculum" as needed.
Much like every student is unique, bringing individual strengths, challenges, and requirements, every hotel is distinctive. Each property operates under unique conditions, with its own data patterns, market dynamics, and strategic goals. That’s why at Hotellistat, we emphasize a holistic approach to evaluating data and train each model step by step, tailoring it to the property’s individual needs.
Unlike one-size-fits-all algorithms or generic models, our strength - and the strength of machine learning - lies in this personalization. This individualized approach ensures that we can provide the most accurate and effective solutions for optimizing revenue management in hospitality.