Creating a data analytics powerhouse...
This month’s blog delves into the world of data analytics.
What does our data tell us?
Data noun
Definition of data
- Factual information (such as measurements or statistics) used as a basis for reasoning, discussion, or calculation the data is plentiful and easily available.
- Information in digital form that can be transmitted or processed.
- Information output by a sensing device or organ that includes both useful and irrelevant or redundant information and must be processed to be meaningful. Data.” Dictionary, Merriam-Webster, Definition of DATA . Accessed February 2024.
So much valuable knowledge about consumers, processes and more, could be lost when meaningful data has not been captured, stored or adequately analysed.
It’s clear that data analytics can offer invaluable insights, used in a variety of different ways to help improve performance and efficiency. Decisions can be backed up with a solid understanding of data and the journey it took before arriving and resting in your database.
For this month’s blog, let me introduce Kirith, Judopay’s Data / BI Engineer who will takes us through the decision on our data warehouse solution and how that has created a data analytics powerhouse. Including what that means for Judopay and our merchants.
In the intricate world of payments, where data is the lifeblood of operations, the choice of a data warehouse can significantly impact analytics capabilities.
This blog delves into the advantages of creating a data analytics powerhouse in BigQuery, tailored to meet the unique demands of the payments industry.
Motivation for change - Why BigQuery?
The motivation behind Judopay adopting BigQuery as our data analytics solution, stems from the following advantages it provides:
Scalability
BigQuery's architecture, designed for seamless scalability, ensures that we are able to effortlessly handle large volumes of data without compromising on analytics performance.
This scalability is crucial for gaining timely insights into payment trends.
Real-time Analytics
BigQuery's capability to analyse streaming data in real-time, allows us to swiftly identify anomalies, patterns, and potential fraudulent activities as they occur.
Machine Learning Integration
BigQuery's seamless integration with Google Cloud's machine learning services, leverages machine learning within the analytics environment. This means we are empowered to stay ahead of evolving threats and fine-tune risk mitigation strategies in real-time.
BigQuery's serverless pricing model ensures cost efficiency by enabling us to pay only for the resources we consume. This scalability, combined with on-demand pricing, allows us to optimise costs while effortlessly scaling analytics resources based on data volumes.
Migrating to BigQuery
To migrate our data warehouse to BigQuery firstly, we needed to align and convert the new schema to cater specifically to focus on payments analytics.
We created a new reporting schema, optimised for analytics, that contains additional data points to provide more detailed analysis.
For example, if we compare the old 3D Secure data schema versus the new 3D Secure data schema, we can now analyse a vast number of additional data points. We can already see the benefit of these additional data points, as we have seen that by mandating a 3D Secure 2 challenge can increase the success rate of transactions.
Another example, if we compare the old integration data schema versus the new integration data schema, the additional data points enables us to see those merchants who are using older integration versions, helping us to ensure that merchants move to the latest integration version.
Next, we streamlined our Extraction, Transformation and Loading (ETL) of the payment data into BigQuery, with the aim of minimising latency, using:
Kafka:
An open-source distributed event streaming platform, enabling us to utilise high-performance data pipelines, streaming analytics and data integration.
Airflow:
A platform created to programmatically author, schedule and monitor workflows. Airflow provides us with the capability for data transformation and loading into BigQuery.
Following this, we conducted exhaustive testing in order to ensure the analytics infrastructure in BigQuery met our requirements, by evaluating the:
- Speed
- Accuracy
- Responsiveness of analytics queries. In particular, transaction trend analysis.
Once we were satisfied, we reconfigured our analytics applications and redirected our payments analytics queries to the new data warehouse.
Now, in the post-migration phase we are closely monitoring the analytics performance, continuously optimising queries and configurations, to ensure sustained efficiency.
We have also used Airflow to centralise our data and provide a holistic view of data shared across the entire finance group.
Future Data Capabilities
With our new data model, that is compatible with the use of Natural Language Processing (NLP), we are in the process of exploring the use of NLP and generative AI tools to enhance our data visualisation capability. We have already conducted a POC with ThoughtSpot Sage as a solution.
Using a tool like ThoughtSpot Sage would allow questions to be asked of the data in natural language, returning the answers in the form of a table or chart, a key enabler for self-serve data analytics.