Infrastructure as Code: Principles and Practice

Baalvion Strategic Brief • June 11, 2026

Strategic Intelligence by Baalvion Engineering

Registry Date: June 11, 2026

9 min read

Infrastructure as Code: Principles and Practice

Infrastructure as Code is a contract, not a convenience

Clicking through a cloud console to stand up a database, a load balancer and a few subnets is fast the first time and a liability every time after. The moment a second environment, a second region or a second engineer enters the picture, manually provisioned infrastructure becomes an undocumented system whose true state lives only in someone's memory. Infrastructure as Code (IaC) replaces that with a declarative source of truth: you describe the desired shape of your infrastructure in version-controlled files, a tool reconciles reality toward that description, and every change is a reviewed, reversible commit. At Baalvion Industries we run the Baalvion Operating System (BOS) across 198 markets and 180+ jurisdictions, with five layers — Infrastructure, Intelligence, Governance, Commerce and Finance — sharing one multi-tenant control plane. Provisioning that by hand would be impossible to operate, impossible to audit, and impossible to reproduce after a regional failure. IaC is what makes the platform reconstructable from source.

The principle that does the heavy lifting is declarative desired-state. Rather than scripting the steps to create a resource, you declare the resource you want and let the engine compute the diff between what exists and what you asked for. That diff — the plan — is the single most valuable artifact in the whole discipline, because it lets a human read exactly what will change before anything does. The corollary is idempotency: applying the same configuration twice produces the same result, so a re-run is safe rather than destructive. This article walks the patterns that turn that principle into a reliable practice — modules, state, drift, policy-as-code and GitOps — with the two tools we use most, Terraform and Pulumi, and the trade-offs each carries.

Terraform and Pulumi: declarative DSL versus general-purpose code

Terraform, from HashiCorp, expresses infrastructure in HCL, a purpose-built declarative configuration language. Its strength is constraint: HCL is hard to abuse, so configurations stay readable and reviewable, and the enormous provider ecosystem covers nearly every cloud and SaaS API through a consistent resource model. Its weakness is the same constraint — anything resembling real logic (loops, conditionals, data transformation) is bolted on through meta-arguments like `count`, `for_each` and `dynamic` blocks, which grow awkward as complexity rises. Pulumi takes the opposite bet: you author infrastructure in a general-purpose language — TypeScript, Go, Python or C# — and get real abstractions, types, testing frameworks and IDE tooling for free. The cost is discipline; the full power of a programming language makes it equally easy to write infrastructure no one else can follow.

The pragmatic position is that both are valid and the choice follows the team, not fashion. We standardise on Terraform for the broad, stable substrate — networks, clusters, managed databases, IAM — where the value is auditability and a low blast radius from any single change. We reach for Pulumi where infrastructure needs genuine programmability: generating per-tenant resource sets across the 180+ jurisdictions BOS serves, where data-residency rules differ by region and copy-pasting HCL would be error-prone. Whichever you pick, the underlying model is identical — providers, a resource graph, a state file, and a plan/apply cycle — so the patterns in the rest of this article transfer directly. Choosing between them is exactly the kind of decision our technology consulting team frames around the team's language fluency and the shape of the estate, rather than a default.

Modules: the unit of reuse and the boundary of blast radius

A module is a reusable, parameterised package of infrastructure — the IaC equivalent of a function. Instead of duplicating fifty lines of HCL to stand up a hardened Postgres instance in every service repository, you write one module with clear inputs (instance size, backup retention, network) and outputs (connection endpoint, security group), version it, and call it everywhere. The payoff is consistency: when a security baseline changes — say, enforcing encryption at rest and a stricter parameter group — you change the module once and roll the new version through environments deliberately. For a compliance-first platform applying AES-256 encryption and SOC 2 Type II controls, that single point of enforcement is worth more than any individual configuration.

Compose, do not nest deeply — keep a shallow hierarchy of small, single-purpose modules (a network module, a database module, a service module) rather than one monolithic root that does everything.
Version and pin — publish modules to a registry and pin consumers to an explicit version, so an upstream change never silently alters a downstream environment.
Sensible inputs, opinionated defaults — expose only the parameters that genuinely vary, and bake the secure, compliant choices in as defaults so the easy path is the correct path.
Module boundaries are blast-radius boundaries — a tightly scoped module limits how much can break from one change, which matters most for shared state like networking and IAM.
Test the module, not just the deployment — validate with terraform validate, tflint, and example-based plan tests (or Pulumi's unit-testing harness) before publishing a version.

Modules also encode tenant topology. In a multi-tenant platform you decide whether a tenant maps to a namespace, a node pool, a dedicated database or a whole account, and that decision lives in a module so it is applied identically every time. We tie those parameters to data residency and regulatory obligations so a regulated tenant in one jurisdiction is provisioned with the isolation its compliance regime requires — the same discipline that backs the multi-tenant identity platform case study, where the boundary between tenants is the boundary that auditors care about most.

State: the map between code and reality

Both Terraform and Pulumi maintain a state file — an authoritative record mapping the resources in your code to the real resources in the cloud. State is what lets the engine compute an accurate plan: without it, the tool could not tell the difference between a resource it created and an identically named one it did not. This makes state the most operationally sensitive object in the whole system. It frequently contains secrets (database passwords, generated keys) in plaintext, so it must never live in a Git repository and must be stored encrypted.

Remote backend — keep state in an encrypted remote backend (S3 with a DynamoDB lock, Terraform Cloud, or Pulumi's managed/self-hosted backend), never on a laptop or in version control.
State locking — enforce locking so two concurrent applies cannot corrupt state by racing each other; a corrupted state file is one of the hardest failures to recover from.
Split state by blast radius — separate state per environment and per concern (networking, data, application) so a change to one stack cannot endanger another and plans stay fast.
Treat imports carefully — bring existing resources under management with import rather than recreating them, and reconcile by reading the plan, never by hand-editing state.
Back up and restrict — version the state bucket and lock down access; whoever can read state can read its secrets.

State splitting is where large estates live or die. A single monolithic state file for an entire platform makes every plan slow, every apply risky, and every lock a queue. We slice state along the same boundaries as the modules — a stack per region per layer — so that an Infrastructure-layer change in one market never forces a recalculation of the Finance layer in another. The cross-stack references that stitch these together (remote state data sources, or stack references in Pulumi) become explicit, reviewable dependencies rather than hidden coupling, which is precisely what keeps the estate auditable as it grows toward 500K+ transactions and 125+ active partners.

Drift: when reality stops matching the code

Drift is the gap that opens when the real infrastructure diverges from what the code declares — an engineer hot-fixes a security group in the console during an incident, an auto-remediation tool flips a setting, or a cloud provider changes a default. Drift is corrosive precisely because IaC's value rests on the assumption that the code is the truth. Once that assumption quietly fails, the next apply may revert a critical fix or, worse, leave operators trusting a description that no longer matches production.

The defence is detection plus discipline. A scheduled `terraform plan` (or `pulumi preview`) with no intended changes should produce an empty diff; any non-empty result is drift and should raise an alert. Tooling like Terraform Cloud's drift detection, or open-source scanners, can run this continuously. The discipline half matters more than the tooling: the answer to drift is to make manual changes culturally and technically expensive — restrict console write access in production, require that even emergency fixes are codified afterward, and reconcile by re-applying the code rather than accepting the drifted state. We accept that incidents sometimes demand a fast manual touch; what we do not accept is leaving that touch unrecorded. The codify-after-the-fact loop keeps the governance and compliance posture honest, because an auditor can trust that the repository genuinely describes production.

Policy-as-code: guardrails that run before apply

A reviewed plan is good; a plan that cannot violate a security or cost rule even if the reviewer misses it is better. Policy-as-code expresses organisational rules as machine-checkable code that gates the pipeline. The dominant engines are Open Policy Agent (OPA) with its Rego language, HashiCorp Sentinel (native to the Terraform ecosystem), and Pulumi's CrossGuard. They run against the plan — before any resource is touched — and fail the pipeline when a rule is broken. This shifts compliance left, from a quarterly audit finding to a build-time error a developer fixes in minutes.

Security invariants — block public S3 buckets, unencrypted volumes, security groups open to 0.0.0.0/0, or IAM policies with wildcard actions, automatically and every time.
Cost guardrails — reject instance types above a budgeted size, or flag a plan whose estimated monthly cost crosses a threshold, before it is applied.
Tagging and provenance — require owner, cost-centre and data-classification tags so every resource is attributable, which is the basis of any real audit trail.
Residency and isolation — enforce that regulated tenant resources land only in approved regions, encoding the per-jurisdiction rules BOS lives under directly in policy.
Soft versus hard policy — distinguish advisory warnings from mandatory failures so teams can iterate while non-negotiable controls stay non-negotiable.

Policy-as-code is how the Governance layer of BOS stops being a document and becomes an executing control. The same Rego ruleset that enforces encryption and residency in the pipeline maps directly to the obligations behind our ISO 27001, SOC 2 Type II and GDPR commitments, and it is the engineering backbone of work like our AI compliance scoring platform, where rules have to be both transparent and machine-enforced. The discipline that builds these guardrails is the same one our DevOps practice applies across every pipeline we run.

GitOps: the repository as the operational interface

GitOps closes the loop by making Git the single source of truth for both application and infrastructure state, with an automated controller continuously reconciling the live system toward what the repository declares. A change to infrastructure becomes a pull request: it is reviewed, it triggers a plan, policy-as-code gates it, and on merge a pipeline (or a controller like Argo CD or Flux for Kubernetes resources) applies it. Nobody runs `apply` from a laptop against production. The benefits compound — every change is reviewed, every change is attributable to an author and a commit, rollback is a `git revert`, and the audit trail is the commit history itself rather than a separate ticketing system bolted on afterward.

The pattern is not free of trade-offs. Reconciliation loops can fight a human who makes a manual change, which is the same drift problem viewed from the other side — and the correct resolution is the same: codify, do not patch. Secrets need careful handling, since plaintext credentials must never enter Git, which pushes teams toward sealed secrets, external secret stores or short-lived dynamic credentials. And a broken pipeline can become a single point of failure for all change, so the pipeline itself deserves the same resilience thinking as production. Done well, GitOps is what turns the abstract promise of IaC — reproducible, reviewable, auditable infrastructure — into the day-to-day operating reality of a platform. It is the practice that lets BOS unify commerce, finance, compliance, logistics and intelligence into one system that can be reconstructed from source after any failure, and it underpins the way Baalvion builds and operates enterprise software for a senior, compliance-first audience.

How it fits together at Baalvion

These five patterns are not a menu; they reinforce each other. Modules give you consistent, reusable building blocks. State gives the engine an accurate map so plans can be trusted. Drift detection guards the assumption that the code is the truth. Policy-as-code ensures that what gets applied cannot violate the rules even when a reviewer is tired. GitOps wires the whole thing to the repository so every change is reviewed, reversible and recorded. The result is the infrastructure-grade, compliance-first posture that lets a small team operate the Baalvion Operating System across 198 markets without a human in the provisioning loop — and the same discipline is what we bring to client engagements through our cloud solutions practice. If you are weighing your own move to Infrastructure as Code, start from these trade-offs rather than from a default of adopting every tool at once.

Frequently Asked Questions

What is the difference between Infrastructure as Code and just scripting my cloud setup?+

Scripts are imperative — they list the steps to create resources, and re-running them can fail or duplicate work. IaC is declarative and idempotent: you describe the desired end state, the tool computes the diff and reconciles toward it, and applying the same configuration twice is safe. That declarative model is what makes environments reproducible and changes reviewable as a plan before anything happens.

Should I choose Terraform or Pulumi?+

Both share the same provider/state/plan model, so the choice follows your team. Terraform's HCL is constrained and highly reviewable with the broadest provider ecosystem — ideal for the stable substrate of networks, clusters and IAM. Pulumi uses a general-purpose language (TypeScript, Go, Python), giving real abstractions, types and testing — better where infrastructure needs genuine programmability, such as generating per-tenant resources across many jurisdictions.

Why is the state file such a big deal?+

State is the authoritative map between your code and the real cloud resources, and it often contains secrets in plaintext. It must live in an encrypted remote backend with locking, never in Git, and should be split by environment and concern to limit blast radius. A corrupted or leaked state file is one of the hardest failures to recover from, which is why it gets the strictest access controls in the whole system.

What causes configuration drift and how do you prevent it?+

Drift happens when someone changes infrastructure outside the code — a console hot-fix during an incident, an auto-remediation tool, or a changed provider default. Prevent it by running a scheduled empty-diff plan to detect divergence, restricting production console write access, and enforcing a codify-after-the-fact discipline so even emergency changes get folded back into the repository instead of leaving the code lying about reality.

What does policy-as-code actually enforce?+

It expresses security, cost, tagging and residency rules as machine-checkable code (using OPA/Rego, Sentinel or Pulumi CrossGuard) that runs against the plan before any resource is created. It can block public buckets, unencrypted volumes, oversized instances or non-compliant regions automatically — shifting compliance from a quarterly audit finding to a build-time error, and turning governance obligations into executing controls.

How does GitOps relate to Infrastructure as Code?+

GitOps is the operating model that makes IaC practical at scale: Git becomes the single source of truth, every change is a reviewed pull request that triggers a plan and policy checks, and a pipeline or controller applies it on merge. Nobody runs apply from a laptop. Rollback is a git revert and the commit history is the audit trail, which is exactly the transparency and auditability a compliance-first platform requires.

Return to Intelligence Nexus