Migration Plan

Informatica to Databricks

Executive Summary Current State Target Architecture Migration Roadmap Transformation Mapping Key Decisions Risks & Mitigations

Executive Summary

This plan outlines the strategic migration of the on-premises Informatica PowerCenter estate to a modern, cloud-native data platform on Azure Databricks. The goal is to enhance scalability, improve data governance via Unity Catalog, and reduce total cost of ownership, with a target decommission date of June 30, 2026.

Total Mappings
620
Workflows
180
Daily Data Volume
750 GB
Regulatory Context
SOX & BCBS 239

Current State Assessment

The current Informatica 10.5 estate is extensive, featuring a high degree of complexity and dependencies. A significant number of transformations, particularly Lookups and Joiners, will require careful re-engineering in Spark. The lack of a known dependency graph presents an initial risk to be addressed in the assessment phase.

Transformation Usage

Mapping Complexity Breakdown

Target State Architecture

The proposed architecture is centered on the Databricks Lakehouse Platform, leveraging a Medallion architecture (Bronze, Silver, Gold) on ADLS Gen2. Unity Catalog will serve as the unified governance layer for data and AI assets. Databricks Workflows are recommended for orchestration, with connectivity to on-premises sources established via ExpressRoute.

Data Sources

On-Prem RDBMS, Files, APIs

→

Azure Databricks Lakehouse Platform

Bronze Layer

Raw Data Landing (Delta)

Silver Layer

Cleaned, Conformed

Gold Layer

Business Aggregates

Storage: ADLS Gen2
Unity Catalog (Governance)
↓

BI & Reporting

Power BI, Tableau

Data Science & ML

Databricks ML

Migration Roadmap

The migration will follow a structured, phased approach to minimize risk and ensure business continuity. Each phase has clear entry/exit criteria, starting with foundational setup and a pilot project before scaling out the migration in domain-focused waves.

Phase 1: Assess & Foundation (8-10 Weeks)

Discovery and Setup

Inventory analysis, dependency mapping, Azure environment setup, CI/CD foundation, and Unity Catalog configuration.

Phase 2: Pilot Domain (4-8 Weeks)

First Mover

Migrate a representative, low-to-medium complexity data domain. Validate architecture, establish patterns, and refine estimation models.

Phase 3: Scale-Out Waves (Ongoing)

Factory-Model Migration

Migrate remaining domains in parallel waves, grouped by business function and technical dependency. Leverage established patterns and automation.

Phase 4: Decommission (By Q2 2026)

Final Cutover

Final parallel run validation, hypercare period, archival of Informatica repositories, and termination of licenses.

Transformation Mapping Cheatsheet

This section provides a quick reference for developers, mapping common Informatica transformations to their Databricks equivalents. Click on any card to view implementation patterns and code snippets in PySpark and SQL.

Key Decisions: Orchestration

Choosing the right orchestration tool is critical. While ADF offers broader legacy connectivity, Databricks Workflows provide tighter integration, unified monitoring, and a simpler operational model within the target platform. Based on a weighted scorecard, Databricks Workflows is the recommended choice.

Option 1: Databricks Workflows (Recommended)

Pros: Tight platform integration, unified security/monitoring. Cons: Fewer connectors for legacy systems.

Option 2: Azure Data Factory + Databricks

Pros: Excellent hybrid/on-prem connectivity. Cons: Two toolchains to manage, potential for higher latency.

Option 3: Airflow on AKS

Pros: Maximum flexibility and customization. Cons: Highest operational overhead and complexity.

Orchestration Options Scorecard

Risks & Mitigations

A proactive approach to risk management is essential. The following are the top identified risks, each with a corresponding mitigation strategy. This register will be actively maintained throughout the project lifecycle.

© 2025 Bayesian AI Solutions Consulting Partners

×

0