gonzalo@flores — ~/en/portfolio/regulated-fintech-iac ES
Gonzalo Flores Kemec

← Portfolio

Platform

Infrastructure as code for a regulated fintech: 12 modules, a single apply, least privilege

I built the infrastructure as code for a regulated fintech platform from the ground up: an OpenTofu/Terraform baseline of 12 reusable modules reproducible across three environments with a single apply, secure-by-design least-privilege IAM, and a CDC analytics pipeline from PostgreSQL to BigQuery with zero impact on the transactional database.

./case --summary
client
Regulated fintech platform (identity/KYC, wallets, transactions)
role
Senior Backend & Platform Engineer
sector
Regulated fintech
links
Software engineering · Data engineering
stack
GCP · OpenTofu / Terraform · Python / Django · PostgreSQL · BigQuery / Datastream · Cloud Build / GitHub Actions

Real case with the client anonymized for confidentiality. The business problem, method and decisions are described; no code or sensitive data is published.

Client. A regulated fintech platform that operates identity and verification (KYC), wallets and user transactions: a domain where every excess permission, every untraceable infrastructure change and every improper access to data is a regulatory liability, not an engineering detail.

Approach

This case spans two links of the method —software engineering and data engineering— with no handoffs between them: whoever writes the infrastructure is the same person who designs the data’s path toward analysis, so the discipline of auditability is not lost in the seam. The first link —reading the organization— was already given by the regulatory framing of the business; the fourth —AI over the data— is out of the delivered scope: here intelligence is applied as the engineer’s working instrument, not as a product. The distinction is deliberate and part of the brand’s honesty.

The thesis that governs the work is that, in a regulated environment, auditability is not an annex: it is the architecture. A fintech system is not sustained by what it does when everything works, but by what it can demonstrate when someone asks who did what, with what permission and over what data. Infrastructure as code, least privilege and data traceability are not three loose best practices: they are the same property —the ability to account for oneself— seen from three layers.

The problem detected

A regulated fintech platform is built on a multi-repo codebase —Django backend, mobile clients, tooling— for a small team, where product velocity coexists with a control requirement that admits no shortcuts. The stated problem was operational: provisioning and reproducing environments —dev, staging, prod— was a manual task of several days, prone to drift from one to another, and without a reliable record of which resources existed or why.

The real problem was prior: without infrastructure as code there is no auditability of the infrastructure. Every permission granted by hand, every bucket created from the console, every undocumented policy exception is a risk surface that no one can explain afterward. In a regulated fintech that is not technical debt: it is compliance debt. And to that was added a growing analytical need —understanding the business over the transactional data— that could not be solved by touching the production database.

Functional assessment

The assessment was not of user processes but of the real state of the platform and its guarantees: which resources existed, under which identities, with which permissions, and what had to be demonstrable in an audit. From there came the layers that the infrastructure had to govern explicitly:

LayerWhat it governsRequired guarantee
Compute and dataCloud SQL, Redis, storage, schedulersreproducibility across environments
Networknetworking, perimeterisolation per environment
IdentityIAM, service accounts, Org Policydemonstrable least privilege
SecretsSecret Managerscoped, traceable access
ObservabilityCloud Loggingdifferentiated retention, trail
AnalyticsBigQuery, Datastreamzero impact on the transactional layer

The assessment included an explicit mapping of privilege escalation paths: where an identity could, by chaining permissions, end up with more power than its role justified. Documenting those paths —to later close them— was part of the assessment as much as inventorying the resources.

Building the technical solution

The central piece is an infrastructure-as-code baseline in OpenTofu/Terraform, of which I was the author and sole owner: 12 reusable modules —Cloud SQL, networking, IAM, Secret Manager, logging, BigQuery, Datastream, Redis, schedulers, storage, Cloud Build and Org Policy— and ~39 declared resources, reproducible across dev, staging and prod. Provisioning an environment went from a manual setup of several days to a single tofu apply.

On top of that baseline, the identity design was secure-by-design, not hardened after the fact:

  • Least-privilege IAM with single-permission custom roles, instead of broad predefined roles.
  • Scoped service-account token-creator bindings, so impersonation is possible only where the design justifies it.
  • Closure of the self-escalation paths detected in the assessment.
  • Surgical Org Policy exceptions (Domain Restricted Sharing) resolved via impersonation, instead of relaxing the policy for the whole organization.
  • Differentiated Cloud Logging retention, tuned to what each type of record requires to keep.

In the Django backend, the KYC and organization screening workflows were modeled as finite state machines with defense in depth: database triggers that prevent invalid transitions even if the application code fails, signed URLs for user media and a signal-driven AuditLog model that leaves a trail of every relevant change. I added end-to-end tests and fixed the ordering of the authentication middleware.

Information and data layer

Data access for analysis was design, not an annex. The central front is a CDC analytics pipeline: PostgreSQL → Datastream → BigQuery, using logical replication to expose analytical views with zero impact on the transactional database. The business can be read over the data without an analytics query ever touching the database that sustains user transactions.

Two governance properties accompany the pipeline:

  • Workload separation. Logical replication decouples the analytical plane from the transactional one; the pressure of business queries is not carried over to the critical operation.
  • Scoped, traceable access. The data’s path toward BigQuery is declared in the same infrastructure as code, under the same identity guarantees: who can read what, and why, is auditable.

On top of that layer, an Apache Superset analytics stack on Cloud Run was also provisioned, likewise declared in infrastructure as code, so that data consumption has a governed and reproducible surface.

How the work was conducted

The work was executed with a code-agent harness governed by purpose-built instruments, not by improvised prompting. The concrete and verifiable parts:

  • Infrastructure as code with reusable modules. The baseline is not a set of scripts: it is 12 composed modules, so a change is reasoned and applied once and propagates to the three environments without manual drift.
  • Read-only Terraform CI on every PR. A Cloud Build pipeline, via GitHub App, runs tofu plan on every pull request under a least-privilege service account, with secret access scoped to the plan stage. The infrastructure change is reviewed with its plan in view, before touching anything in production —the agent and the reviewer work against the change’s contract, not against the live environment.
  • Engineering standards across 7 repositories: pre-commit, CI gates, ruff/black/mypy and secret scanning, so discipline belongs to the repository and not to the will of each commit.
  • An internal tooling suite that links ClickUp↔GitHub at the commit level, giving traceability between the business task and the code change.
  • The repetitive tasks —checking an IAM change against least privilege, recording a policy exception, verifying the shape of a module— become reproducible workflows over these tools, instead of manual steps that degrade with fatigue.

The point: AI is not in the product delivered to the client; it is in how I build, as any serious engineer uses their instrumentation today. Showing it this way —factual, without adjectives— is what distinguishes real command of the tool from the fashionable discourse.

What this case proves

  • Real platform engineering: an infrastructure-as-code baseline of 12 modules and ~39 resources, reproducible across three environments with a single apply, not a collection of changes made by hand in the console.
  • Regulated-environment discipline: demonstrable least privilege, closed escalation paths and surgical policy exceptions —security as architecture, not as a later patch.
  • Data engineering with respect for the critical: a CDC pipeline that opens the data to analysis without a single query touching the transactional database that sustains users’ money.