Skip to content

Energy & Utilities

Building an Event-Based Pipeline and Data Quality Framework in Energy Supply
& Sales

Building an Event-Based Pipeline and Data Quality Framework in Energy Supply
& Sales

Success Story

Metricating your business operations is the best way to comprehend them and make improvements where they are most needed. Our client, a European energy trading company, wanted to make information rapidly available so market analysts and traders, as well as internal systems, could act on it and, on the other hand, ensure the data meets certain quality thresholds. Scalefocus created an event-driven pipeline that applied real-time triggers across the organization so front- and mid-offices could take proactive steps to mitigate risks and act on insights that unfold at the moment.

Raw input turns into actionable data

Increased
Resilience

Modern data pipeline architecture

The Client

An international energy company that operates numerous power plants and leverages advanced technologies and innovations to manage large-scale power systems and infrastructure. The company sources oil, natural gas, and global commodities and trades electricity, emissions certificates, natural gas, LNG, and coal.

The Challenge 

Our client’s trading and market analytics teams conduct a range of analytical, modeling, and optimization activities to leverage positions in international commodity markets. Market, pricing, weather, and other data types play a crucial role in quickly identifying opportunities and assessing corresponding risks.  

The backbone of their operations is a big data storage platform used by various departments for analysis and forecasting. Data could be acquired and used in various ways—directly via the UI, through REST API calls, via a dedicated Excel plugin, or through querying the Snowflake data storage. The legacy platform is a solution for flowing market data across systems, processes, and departments, ensuring rapid and high-quality insights.  

While the platform ingests data from multiple external sources, automatically and manually invoked, in 2022 the Scalefocus .NET team executed a big step in the evolution of the platform in collaboration with the client’s internal team. We developed a combination of Azure API services, Azure Functions, and a specifically designed data triggers so processes, and workflows could be automatically triggered based on specific events to settle a more rapid approach. On top of this, we had to figure out how to ensure the quality of the data ingested into the time series meets the expected thresholds for various use cases.

These enormous data sets contain tens of thousands of data series with more than a billion records, so the entire event triggering and quality check processes had to be developed in a scalable and effective way without jeopardizing the existing ingestion process.

The client also needed an experienced partner who could define the data quality goals of the analytics platform and implement a data quality framework that continuously profiles data for errors, completeness, consistency, and reliability.

The Solution

The solution was split into two major phases.

First, we established the event-based pipeline (EBP) architecture throughout Azure API Management to ensure that events can trigger various downstream tools. Moreover, the rule checking and following EBP can be designed to start once the data for the monitored objects are ingested. This allows the end users to automate or semi-automate many of their processes once the data is available or complete.

Second, we developed a data quality framework (DQF) that ensures the ingested data meets the required quality thresholds for the specific use cases, mainly categorized into two concepts – Availability and Completeness of the data:

Data Availability – checks if enough data points were inserted into the system. It counts the number of received records into the Snowflake database for a specified time range. For example, a user wants to check if there are exactly 96 data points per day of the energy pricing curves he has been interested in for the past 3 months. Based on these quality criteria for specific time series, the rule can automatically trigger a dashboard in Tableau, for example, or notify the user if there is a different number of data points than expected. The rule can be configured to start in a certain datetime or when new data is incoming through the ingestion pipeline. We designed the systems to be easily scalable, where users can combine multiple availability rules into a single rule – the so-called meta-rules.

Values Threshold – the aforementioned Availability rule could also be designed to trigger downstream processes if a specific value is met per the specified criteria. For example, if any upcoming ingested data reaches above 100 EUR/MWh, a report or notification is triggered.

Data Completeness – checks if the required number of records is received in accordance with each time point of a given timestamp. For example, for hourly data curves, the system can provide a report for which exact hours in a range of 30 days there are missing data points. Similar rules can be designed for time series with different intervals of the data points – minutes, quarter hours, days, months, etc.

The technological challenge was to cover all the data sources while maintaining the system’s productivity. The rules of the DQF can require scraping and reading data from hundreds of data sources simultaneously – processing hundreds of data objects and time series. Thanks to the platform’s maturity, we had a clear vision of how we wanted to develop it further and where the EBP and DQF could add real business value without jeopardizing the existing performance.

The Benefits 

Scalefocus upgraded the client’s existing legacy platform with new functionalities and features that heavily optimized its performance. For the entirety of the project, we have followed a strict process of constantly keeping the platform and its underlying codebase and external libraries on the latest stable version. Our team regularly monitors and documents much needed upgrades and executes them in accordance with the workload and severity of the need. Our daily interaction with the client allows us to act in an agile way and implement any urgent user requests. We are constantly introducing new features and services leading to an improved performance and user satisfaction.

Among them:

  • the implementation of automatically triggered action based on events and data quality rules 
  • introduction of a new frontend library for listing and filtering of data in a table like format  
  • expansion of the clustering indexes on Snowflake level for improved partitioning and filtering of data 
  • complete redesign and rework of the search functionalities based on Azure Cognitive Search / Azure AI 

We improved the data quality, and now users can rely on fast data ingestion and a data quality framework that provides complete visibility on all data series. If the data is inconsistent or missing, the users are notified, which is very important for the operations of the middle office and the traders. Before, users often were unaware that they were missing vital data points. Now they receive up-to-date reports and, within seconds, can scan the report and see what data they are missing, which is very useful when you have to monitor hundreds of data objects.

Increased Resilience – event-driven architecture that enables increased resilience, reducing dependencies between applications.

Scalability – modern data pipeline architecture with seamless design to integrate new data sources while maintaining scalability and business operations

Predictability – it is easy to follow the path of data and monitor any missing time series

Raw input turns into actionable information

The Technology

.Net
React
Snowflake
Azure SQL DB
Azure Blob Storage
Azure DevOps
Azure Function Apps (serverless)
Azure API Management

Our Work

We have a global client base that includes Fortune 500 companies, innovative startups and industry leaders in Information Technology, E-Commerce, Insurance, Healthcare, Finance and Energy & Utilities.

Ready to scale and meet the technology challenges of tomorrow?

Contact us