Executive Summary
This plan outlines the strategic migration of the on-premises Informatica PowerCenter estate to a modern, cloud-native data platform on Azure Databricks. The goal is to enhance scalability, improve data governance via Unity Catalog, and reduce total cost of ownership, with a target decommission date of June 30, 2026.
Current State Assessment
The current Informatica 10.5 estate is extensive, featuring a high degree of complexity and dependencies. A significant number of transformations, particularly Lookups and Joiners, will require careful re-engineering in Spark. The lack of a known dependency graph presents an initial risk to be addressed in the assessment phase.
Transformation Usage
Mapping Complexity Breakdown
Target State Architecture
The proposed architecture is centered on the Databricks Lakehouse Platform, leveraging a Medallion architecture (Bronze, Silver, Gold) on ADLS Gen2. Unity Catalog will serve as the unified governance layer for data and AI assets. Databricks Workflows are recommended for orchestration, with connectivity to on-premises sources established via ExpressRoute.
Data Sources
On-Prem RDBMS, Files, APIs
Azure Databricks Lakehouse Platform
Bronze Layer
Raw Data Landing (Delta)
Silver Layer
Cleaned, Conformed
Gold Layer
Business Aggregates
BI & Reporting
Power BI, Tableau
Data Science & ML
Databricks ML
Migration Roadmap
The migration will follow a structured, phased approach to minimize risk and ensure business continuity. Each phase has clear entry/exit criteria, starting with foundational setup and a pilot project before scaling out the migration in domain-focused waves.
Phase 1: Assess & Foundation (8-10 Weeks)
Discovery and Setup
Inventory analysis, dependency mapping, Azure environment setup, CI/CD foundation, and Unity Catalog configuration.
Phase 2: Pilot Domain (4-8 Weeks)
First Mover
Migrate a representative, low-to-medium complexity data domain. Validate architecture, establish patterns, and refine estimation models.
Phase 3: Scale-Out Waves (Ongoing)
Factory-Model Migration
Migrate remaining domains in parallel waves, grouped by business function and technical dependency. Leverage established patterns and automation.
Phase 4: Decommission (By Q2 2026)
Final Cutover
Final parallel run validation, hypercare period, archival of Informatica repositories, and termination of licenses.
Transformation Mapping Cheatsheet
This section provides a quick reference for developers, mapping common Informatica transformations to their Databricks equivalents. Click on any card to view implementation patterns and code snippets in PySpark and SQL.
Key Decisions: Orchestration
Choosing the right orchestration tool is critical. While ADF offers broader legacy connectivity, Databricks Workflows provide tighter integration, unified monitoring, and a simpler operational model within the target platform. Based on a weighted scorecard, Databricks Workflows is the recommended choice.
Option 1: Databricks Workflows (Recommended)
Pros: Tight platform integration, unified security/monitoring. Cons: Fewer connectors for legacy systems.
Option 2: Azure Data Factory + Databricks
Pros: Excellent hybrid/on-prem connectivity. Cons: Two toolchains to manage, potential for higher latency.
Option 3: Airflow on AKS
Pros: Maximum flexibility and customization. Cons: Highest operational overhead and complexity.
Orchestration Options Scorecard
Risks & Mitigations
A proactive approach to risk management is essential. The following are the top identified risks, each with a corresponding mitigation strategy. This register will be actively maintained throughout the project lifecycle.