Le Duy Khuong (Daniel)

Dev Productivity & Tools

Lakehouse BRD — Chapter 1: Overview

Strategic goals, project rationale, data ecosystem, Lakehouse scope, and success criteria for a unified data platform.

2026-03-172 min read

1.1 Strategic goals

The Lakehouse project is a core initiative in the organization’s digital transformation plan. Overall goals:

  • Build a unified data platform serving multiple purposes: operations, reporting, risk management, partners, and AI/ML.
  • Meet growing requirements for access speed, accuracy, and data quality control.
  • Ensure legal compliance, in particular applicable personal data protection regulations.

Aligned with organization-level strategy, the Lakehouse system contributes to: better data-driven decisions; stronger credit and operational risk control; deeper digital partner integration and API adoption; and higher AI capability.

1.2 Project rationale

When data systems remain fragmented, common issues include:

  • Slow data access: Manual aggregation of reports from many sources.
  • Lack of standardization and lineage: Hard to trace data origin.
  • Limited AI/ML scale: Models depend on manually aggregated data.
  • Constrained partner integration: APIs cannot support real-time queries.
  • Personal data compliance risk: Insufficient masking and access control.

1.3 Data ecosystem overview

Main componentTypical systems
CRMIn-house (may inherit existing)
Core LendingInternal business applications
PaymentE-wallet, bank partner integration
Risk EnginePython + SQL + dashboard
ReportingPower BI + Excel rollups
DWHMSSQL DWH, BigQuery (or equivalent)
API LayerOpenAPI via Gateway
Logging / AuditDistributed; needs consolidation

1.4 Lakehouse application scope

Subsystem / Business areaApplication scope
Operations reportingMulti-dimensional: by store, region, staff, product group
Financial analysisRevenue, cost, profit, margin
Credit riskScoring, overdue debt, LTV analysis
Customer analysisLifecycle, behavior, target segments
AI & MLSegmentation, risk prediction, up-sell
Partner integrationB2B API, controlled data sharing
Data controlData masking, audit log, lineage, ownership

1.5 Definition of success

Criteria groupSuccess criteria
TechnicalProcess >10M records/day, ingest latency <5 min, query <3s
BusinessReal-time operations dashboard covering all service points
ML & AIClean, complete data for at least 3 AI models in operations
Governance100% of tables have a data steward and full metadata
LegalCompliance with applicable data protection regulations and internal security standards
LDK

Le Duy Khuong

AI Transformation & Digital Strategy. Writing about agentic systems, engineering leadership, and building in public.