Porting Codes from Splunk to Pyspark

Project title: Porting Codes from Splunk to Pyspark
Client: global leader in corporate banking
Industry: Banking, Finance & Insurance
Expertise: Cloud & Infrastructure
Project scope: Infrastructure design
Tools: Databricks, Pyspark, Python, Azure
Porting codes means rewriting scripts from one programming language to another. As technology evolves, some tools and methods become outdated. Modern programming languages, libraries, and platforms offer enhanced functionality, improved performance, and lower maintenance costs. Consequently, technologies chosen at the start of a project may become insufficient over time, particularly regarding computational speed, database capacity, or handling increasing data volumes.
Why Pyspark, Databricks, and Azure?
These technologies enable distributed computing, efficient processing of large data volumes, and flexible storage for both raw and transformed data. Delta Tables in Data Lake allow easy data recovery and recalculations of metrics, capabilities that traditional databases lack.
The Challenge
In this case, the client needed to rewrite hundreds of scripts written in Splunk. The existing database infrastructure was not designed to handle growing data volumes or simultaneous user access. ALTEN Polska took on the challenge of migrating these scripts to Pyspark on the Databricks platform integrated with Azure.
Project Execution
The project began with analyzing the client’s existing infrastructure and understanding their business needs. A general approach was developed, followed by a proof of concept (PoC) based on sample data. Subsequently, workspaces were set up on the Databricks platform, and connections to Azure Storage and Data Lake were established. Once the coding standards were approved, the team began rewriting scripts from Splunk to Pyspark.
Final Outcome
- All scripts were rewritten and integrated into Databricks workflows.
- Computational times were significantly reduced.
- Scalable clusters helped lower operational costs.
- Unified code standards streamlined onboarding for new engineers.
The solution supports future development within the same environment and offers integration options for additional programming languages like Scala and SQL. With continuous support and development, the Databricks platform ensures long-term technological and operational benefits for the client.