Holon AWS Success Stories

Client

Spanish Telecom Provider

Timeframe

2017-2019

Project Information

Governance Application on a Hybrid Data Lake

Technologies

AWS EKS

 

AWS SQS

 

AWS RDS

 

Terraform IAAS

 

On-prem Infrastructure

Project Request

The customer identified the need for a GDPR compliant application giving company users access to all kinds of data stored in a distributed Data Lake. The users themselves need to justify their access request by creating a data processing procedure and a domain owner is responsible for granting access. As a consequence, the data processing procedure is registered with a company-wide directory, where all data needs can be reviewed by auditors.

The Solution

In order to leverage a maximum of comfort for the end user, the application features a shopping cart mechanism: Items (= data objects) can be put into a cart. The items have tags to categorize the data types and users can "pay" using a data processing procedure that covers exactly those data types.

After a short market research it became clear, that the company's requirements could only be covered partially by out-of-the-shelf Data Governance tools, so Holon decided to go for a "build" solution rather than a "buy" approach. An additional driver was the fact that - though the backbone of the Data Lake was build in AWS, relying on Kafka and S3 - other on-prem and cloud systems needed to be integrated not only for access but also for collecting object metadata in a centralized Data Catalog.

Platform Considerations

All Systems - cloud native as well as on-prem build - were integrated with a REST interface towards our application that are responsible for provisioning user access. For delivering metadata in an asynchronous way we decided to use a AWS SQS queue, consolidating system-specific queues into one application queue. The application itself runs on an EKS container to have full flexibility on scaling, because we planned for other auxiliary applications for maintenance and administration. It was implemented using REACT with its wide variety of frameworks and visualization components. The backend of the application was pure Java with an Oracle RDS database for persistency.

Both, RDS as a full flavored, relational Database service as well as EKS, as an AWS service for managed Kubernetes, proofed very reliable, resilient choices with a minimum of administration effort.

Regarding Single Sign On (SSO) and Identity Management we took advantage of the customer's IT centralized services: an on-prem Active Directory, a module for authentication and identity federation and a governance service taking care of creation, deletion and updates of user information.

Deployment Pipeline and IAAS

While deployments to the 3 environments for development, test and production are done automatically through gitlab and gitlab runner, Terraform scripts were deployed separately to build a reliable, reproduceable infrastructure for all environments.

Why Holon, why AWS?

Holon was chosen by the customer to drive the project in an agile way and find the resources needed for a detailed solution concept as well as implementation power. During the project it was crucial to react on changing requirements and a close stakeholder management was necessary. AWS services helped a lot to enable a quick prototyping of software components and reduce the time-to-deliver over all, especially when it comes down to security topics and unpredictable infrastructure needs.

Results

Though reiterating and refining the solution concepts from the ground was necessary as the customer needed to verify assumptions and data needs, the project was a huge success: AWS services and central services of the customer were seamlessly integrated and enable the companies' users to satisfy their data needs as well as take advantage of modern approaches for near-real-time data streaming and Big Data clusters as well as reliable and proven Business Intelligence applications and other database services.

Room for more to follow

By building a modular base system, we allowed the customer not only to scale up the environment but also to bring new functions to the end users:

  • Scalability through multiple systems
    By means of a middleware design using RESTful interfaces, the customer is able to attach additional systems as data containers to the governance platform with minimal effort: as long as the interface specifications and connectivity is considered properly, new data objects can be reported by the application no matter where they reside.
  • Ingestion on Demand
    Not only data structures inside the Data Lake can be detected but also metadata for external, non-Data Lake systems. Presenting them to users as part of the companies data offer not only broadens the governance perspective but also gives them the possibility to ingest data with standardized procedures in a self-service fashion whenever they need it.
  • Cloud Migration
    By considering the aforementioned benefits, it is possible to do an iterative migration to AWS Redshift by sourcing and ingesting legacy applications including the existing Data Warehouse and building new data structures in AWS.
  • Full Cost Control
    By using Cost Control Tags and Cost and Usage reports, the customer will be able to detect cost drivers, consolidate these per business unit and define budgets and threshold values for alerting.

About Holon

We care about our customers by driving complex, hybrid projects as well as finding the right approach to quickly deliver insights. We are focused on Business Intelligence and Big Data, specialized to enable migrations to the cloud.