Database Optimization for a Travel Metasearch Engine

The Case

A travel fare aggregator company wanted to enrich their multi-supplier properties database in order to ensure a wider variety of choices for the website visitors. They decided to implement a solution which was able to map thousands of entities contained in several external sources, identify duplicates in the portfolio and combine the unique ones all together into a core database, capable of providing their customers with up-to-date accurate information and confident booking experience.

Technical challenge

 
  • Optimize and verify over 3.5 billion of SQL queries.
  • Achieve more than 90 % correctness of unstructured data.
  • Harmonize and consolidate data from more than 70 different data sources.
  • Delivery under two calendar weeks.

The Database Optimization

When ScaleFocus came on board, our DWH team (architects and data analysts) performed a comprehensive analysis of the source databases and data (Microsoft SQL, MS Excel, MySQL, Oracle, etc.), checking the compatibility between different attributes and entities. The challenges we faced – mainly in data inconsistency i.e. same hotels with different names, wrong names of the hotels, different addresses, different owners, etc. were handled by implementing complex mathematical and statistical methods/models. In order to perform the new logic our team used Oracle DB and MS Excel to export the data.

ScaleFocus got involved in the following areas of solution’s implementation:

  • Create the comparison logic in order to identify any duplicated data.
  • Develop and optimize database queries in Oracle DB (names, address, geographical coordinates, etc.).
  • Do data subsetting and data cleansing.
  • Perform SQL database tuning and performance optimization.
  • Customized Levenstein algorithms implementation.
  • Graph implementation and Depth-First-Search (DFS).

Technology Stack

 
  • Oracle DB
  • Microsoft SQL Server
  • MySQL
  • MS Excel

Achievements

  • In just 2 weeks our client enriched its database with over 5000 unique properties.
  • The solution overcame all data inconsistency objectives reaching 94% data correctness (60% initially).
  • Enhanced confident reservation experience avoiding property matching errors and duplicates in the booking system.
  • Easy administration and effective management of a complex multi-supplier portfolio database.
  • Cost-efficiency due to outsourcing of time-consuming manual research and maintenance.
  • The architecture of the solution saves valuable time and effort in the future maintaining and expansion of the database.

The Client

Our client is among the world’s leaders in the hospitality industry.