To reduce costs, our financial customer needed to migrate all their business and banking data from Teradata to Hadoop. But the data to be migrated was extremely sensitive and essential for daily business activities, which meant the migration needed to be highly secure with no downtime. To add to the complexity, the data came from multiple banking sectors such as financial data, loans, credit cards, and consumer banking, each with its unique migration challenges. Our customer needed a technology partner that could handle these challenges and complete the data migration in just 18 months.
Beyondsoft was chosen to plan and implement the migration from Teradata to Hadoop. Initially, our customer had estimated the scope of the migration to be 2,000 tables with 2,800 objects. However, once Beyondsoft initiated the work, we found they had inadvertently miscalculated and that there were actually 6,000 objects, making the 18-month project timeline even more challenging.
Additionally, because the migration source and destination were very different, to avoid data and functionality issues, Beyondsoft needed to redefine the data architecture. By redesigning the architecture to accommodate the destination format, Beyondsoft achieved performance improvements and complied with end-user requirements.
Furthermore, to avoid disrupting any services that were dependent upon the data, Beyondsoft worked with the customer to create a prioritization plan for the migration, starting with the most frequently used data first and ending with infrequently used data. Throughout the project, Beyondsoft ensured that more than 100 user entities who required data did not see any downtime, even when occasional issues were encountered.
The redesigned Hadoop architecture
The data migration was a resounding success, delivering the following benefits to our customer:
- Migration from Teradata to Hadoop with no downtime.
- On-time project completion.
- Reduced data costs by 80%.
- Achieved a 30% performance improvement.
Beyondsoft used the following technologies:
- Presto as a query engine
- Alluxio for data orchestration
- Apache Hive for managing tasks and collaboration
- Yarn as a package manager
- Jenkins as an automation server
- Apache Superset for data exploration and data visualization
- Apache Spark for data processing