Deploying a terraform aws validator setup manually is how teams end up with snowflake infrastructure environments that work but that nobody can reproduce, audit, or recover from when something goes wrong. One engineer knows how the validator was configured. That engineer leaves. You’re left staring at a running node with no documentation and no way to recreate it.
Infrastructure as Code solves this completely. This guide covers how to deploy production-grade blockchain validators on AWS using Terraform from VPC architecture and IAM least privilege to encrypted state management, secrets handling, and the security controls that most tutorials skip.
Every pattern here is production-tested on Ethereum, Cosmos, and Substrate-based validator infrastructure.
Why Terraform Is the Standard for Validator Infrastructure
Before writing code, it’s worth being clear about what you’re optimising for in a terraform aws validator setup. A validator node has requirements that most cloud workloads don’t:
Immutability – you cannot afford configuration drift. A validator that behaves differently in staging versus production creates consensus failures. Terraform enforces identical configuration across environments.
Auditability – every infrastructure change should be reviewable, approvable, and reversible. Git history on your Terraform code is your audit log.
Recovery speed -> if a validator host is compromised or fails, you need to reproduce the exact infrastructure in minutes, not hours. A terraform apply against a clean state file is your disaster recovery plan.
Secret separation -> validator keys must never be baked into Terraform code or state files. Terraform integrates with AWS Secrets Manager and HashiCorp Vault to handle this cleanly.
The terraform aws validator pattern addresses all four requirements simultaneously.
Repository Structure
Before writing a single resource, establish the right repository structure for any terraform aws validator deployment. Flat Terraform files work for demos. They don’t work for production validator infrastructure.
validator-infra/
├── environments/
│ ├── testnet/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── terraform.tfvars
│ └── mainnet/
│ ├── main.tf
│ ├── variables.tf
│ └── terraform.tfvars
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── validator/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── security/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ └── monitoring/
│ ├── main.tf
│ ├── variables.tf
│ └── outputs.tf
├── backend.tf
└── versions.tfThis structure separates environments (testnet and mainnet never share state), encapsulates concerns in modules (networking, compute, security, monitoring), and makes it impossible to accidentally apply testnet configuration to mainnet.
Step 1 – Terraform Backend: Encrypted Remote State
Never store Terraform state locally for any production terraform aws validator deployment. State files contain resource IDs, IP addresses, and potentially sensitive outputs. Encrypt it, lock it, and version it.
# backend.tf
terraform {
backend "s3" {
bucket = "your-org-terraform-state"
key = "validators/mainnet/terraform.tfstate"
region = "eu-west-1"
encrypt = true
dynamodb_table = "terraform-state-locks"
kms_key_id = "arn:aws:kms:eu-west-1:ACCOUNT_ID:key/KEY_ID"
}
}Create the S3 bucket and DynamoDB table with a bootstrap configuration before anything else:
# bootstrap/main.tf - run once with local state, then migrate
resource "aws_s3_bucket" "terraform_state" {
bucket = "your-org-terraform-state"
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
kms_master_key_id = aws_kms_key.terraform_state.arn
}
}
}
resource "aws_s3_bucket_public_access_block" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
block_public_acls = true
block_public_policy = true
ignore_public_acls = true
restrict_public_buckets = true
}
resource "aws_dynamodb_table" "terraform_state_locks" {
name = "terraform-state-locks"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}Step 2 – VPC Module: Network Isolation for Validators
In a terraform aws validator deployment, a validator node should never be directly accessible from the public internet. The VPC architecture enforces this at the network level:
# modules/vpc/main.tf
resource "aws_vpc" "validator" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-validator-vpc"
Environment = var.environment
ManagedBy = "Terraform"
}
}
# Private subnet - validators live here, no internet access
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.validator.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, count.index)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-validator-private-${count.index}"
Type = "private"
}
}
# Public subnet - only for NAT gateway and bastion
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.validator.id
cidr_block = cidrsubnet(var.vpc_cidr, 4, count.index + 10)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = false
tags = {
Name = "${var.environment}-validator-public-${count.index}"
Type = "public"
}
}
# NAT Gateway - outbound only internet access for validators
resource "aws_eip" "nat" {
count = length(var.availability_zones)
domain = "vpc"
}
resource "aws_nat_gateway" "main" {
count = length(var.availability_zones)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${var.environment}-nat-${count.index}"
}
}
# Internet Gateway - only for public subnet
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.validator.id
}
# Private route table - traffic goes via NAT
resource "aws_route_table" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.validator.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
}
resource "aws_route_table_association" "private" {
count = length(var.availability_zones)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}This architecture means your validator has outbound internet access (needed for peer discovery and block propagation) but is unreachable from the outside. P2P traffic is handled via security group rules, not public IPs.
Step 3 – Security Groups: Precise Port Control
Each blockchain has specific port requirements. In a terraform aws validator deployment, don’t open everything open exactly what the validator needs:
# modules/security/main.tf
# Ethereum validator security group
resource "aws_security_group" "ethereum_validator" {
name = "${var.environment}-ethereum-validator"
description = "Security group for Ethereum validator node"
vpc_id = var.vpc_id
# Execution layer P2P (Geth/Nethermind)
ingress {
from_port = 30303
to_port = 30303
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "Execution layer P2P TCP"
}
ingress {
from_port = 30303
to_port = 30303
protocol = "udp"
cidr_blocks = ["0.0.0.0/0"]
description = "Execution layer P2P UDP discovery"
}
# Consensus layer P2P (Lighthouse/Prysm)
ingress {
from_port = 9000
to_port = 9000
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
description = "Consensus layer P2P TCP"
}
ingress {
from_port = 9000
to_port = 9000
protocol = "udp"
cidr_blocks = ["0.0.0.0/0"]
description = "Consensus layer P2P UDP"
}
# Internal monitoring restricted to VPC only
ingress {
from_port = 9090
to_port = 9090
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
description = "Prometheus metrics VPC only"
}
# SSH via SSM no direct SSH port needed
# AWS Systems Manager Session Manager handles remote access
# This eliminates the need for port 22 entirely
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
description = "All outbound"
}
tags = {
Name = "${var.environment}-ethereum-validator-sg"
Environment = var.environment
}
}The absence of port 22 in the ingress rules is intentional. AWS Systems Manager Session Manager provides shell access without exposing SSH to any network including the VPC. This is the recommended pattern from the AWS Guidance for Ethereum Node Validator architecture.
Step 4 – IAM: Least Privilege for the Validator Instance
The validator EC2 instance needs exactly three permissions: read validator keys from Secrets Manager, write metrics to CloudWatch, and allow SSM Session Manager access. Nothing else.
# modules/security/iam.tf
resource "aws_iam_role" "validator" {
name = "${var.environment}-validator-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}]
})
tags = {
Environment = var.environment
ManagedBy = "Terraform"
}
}
# SSM Session Manager access replaces SSH
resource "aws_iam_role_policy_attachment" "ssm" {
role = aws_iam_role.validator.name
policy_arn = "arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"
}
# CloudWatch metrics
resource "aws_iam_role_policy_attachment" "cloudwatch" {
role = aws_iam_role.validator.name
policy_arn = "arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy"
}
# Secrets Manager read-only, scoped to validator secrets only
resource "aws_iam_role_policy" "secrets" {
name = "validator-secrets-read"
role = aws_iam_role.validator.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Effect = "Allow"
Action = [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
]
Resource = "arn:aws:secretsmanager:${var.aws_region}:${var.account_id}:secret:${var.environment}/validator/*"
}]
})
}
resource "aws_iam_instance_profile" "validator" {
name = "${var.environment}-validator-profile"
role = aws_iam_role.validator.name
}The Secrets Manager policy uses a resource ARN scoped to ${environment}/validator/* the instance can only read its own secrets, not any other secret in the account.
Step 5 – Secrets Management: Validator Keys Never Touch State
This is the most critical part of any terraform aws validator deployment. Validator keys must never be stored in Terraform state, never in environment variables, and never in EC2 user data scripts.
The correct pattern: store keys in AWS Secrets Manager before running Terraform, then reference them by ARN in the instance configuration.
# Create the secret placeholder with Terraform
# Upload the actual key value manually or via a separate secure process
resource "aws_secretsmanager_secret" "validator_key" {
name = "${var.environment}/validator/signing-key"
description = "Validator signing key for ${var.environment}"
kms_key_id = aws_kms_key.validator.arn
recovery_window_in_days = 30
tags = {
Environment = var.environment
Sensitivity = "critical"
}
}
# KMS key for secret encryption
resource "aws_kms_key" "validator" {
description = "KMS key for ${var.environment} validator secrets"
enable_key_rotation = true
deletion_window_in_days = 30
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Sid = "Allow validator instance to decrypt"
Effect = "Allow"
Principal = {
AWS = aws_iam_role.validator.arn
}
Action = ["kms:Decrypt", "kms:DescribeKey"]
Resource = "*"
},
{
Sid = "Allow admin management"
Effect = "Allow"
Principal = {
AWS = "arn:aws:iam::${var.account_id}:root"
}
Action = "kms:*"
Resource = "*"
}
]
})
}The validator process retrieves the key at runtime via the AWS SDK, using the instance role credentials. The key never appears in any Terraform state, any environment variable, or any log.
Step 6 – Validator Module: EC2 Instance Configuration
# modules/validator/main.tf
data "aws_ami" "ubuntu" {
most_recent = true
owners = ["099720109477"] # Canonical
filter {
name = "name"
values = ["ubuntu/images/hvm-ssd/ubuntu-*-22.04-amd64-server-*"]
}
}
resource "aws_instance" "validator" {
ami = data.aws_ami.ubuntu.id
instance_type = var.instance_type
subnet_id = var.private_subnet_id
iam_instance_profile = var.instance_profile_name
vpc_security_group_ids = [var.security_group_id]
# EBS root volume encrypted
root_block_device {
volume_type = "gp3"
volume_size = 50
encrypted = true
kms_key_id = var.kms_key_arn
delete_on_termination = true
}
# Separate data volume for chain data
ebs_block_device {
device_name = "/dev/sdf"
volume_type = "gp3"
volume_size = var.chain_data_volume_size
encrypted = true
kms_key_id = var.kms_key_arn
delete_on_termination = false # Preserve chain data on instance replacement
iops = 3000
throughput = 125
}
# No public IP instance is in private subnet
associate_public_ip_address = false
# Disable instance metadata service v1 use IMDSv2 only
metadata_options {
http_endpoint = "enabled"
http_tokens = "required" # Force IMDSv2
http_put_response_hop_limit = 1
}
# User data minimal, no secrets
user_data = base64encode(templatefile("${path.module}/templates/user_data.sh", {
environment = var.environment
secret_arn = var.validator_key_secret_arn
aws_region = var.aws_region
chain = var.chain
consensus_client = var.consensus_client
execution_client = var.execution_client
}))
tags = {
Name = "${var.environment}-validator"
Environment = var.environment
Chain = var.chain
ManagedBy = "Terraform"
}
lifecycle {
ignore_changes = [user_data] # Prevent replacement on user_data changes
}
}
# Elastic IP for consistent P2P peer identity
resource "aws_eip" "validator" {
domain = "vpc"
tags = {
Name = "${var.environment}-validator-eip"
Environment = var.environment
}
}
resource "aws_eip_association" "validator" {
instance_id = aws_instance.validator.id
allocation_id = aws_eip.validator.id
}Two important details here. The delete_on_termination = false on the chain data volume means if you replace the instance, the chain data persists you don’t resync from scratch. The http_tokens = "required" forces IMDSv2, which prevents SSRF-based metadata service attacks that have been used to steal EC2 credentials.
Step 7 – Multi-Environment with Workspaces
The testnet and mainnet environments should be identical in structure but isolated in state. Terraform workspaces handle this cleanly:
# environments/mainnet/main.tf
module "vpc" {
source = "../../modules/vpc"
environment = "mainnet"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
module "security" {
source = "../../modules/security"
environment = "mainnet"
vpc_id = module.vpc.vpc_id
vpc_cidr = module.vpc.vpc_cidr
aws_region = var.aws_region
account_id = var.account_id
}
module "validator" {
source = "../../modules/validator"
environment = "mainnet"
instance_type = "c6g.2xlarge" # Graviton2 for cost/performance
private_subnet_id = module.vpc.private_subnet_ids[0]
security_group_id = module.security.validator_sg_id
instance_profile_name = module.security.instance_profile_name
kms_key_arn = module.security.kms_key_arn
validator_key_secret_arn = module.security.validator_key_secret_arn
chain_data_volume_size = 2000 # 2TB for full Ethereum node
chain = "ethereum"
consensus_client = "lighthouse"
execution_client = "geth"
aws_region = var.aws_region
}
module "monitoring" {
source = "../../modules/monitoring"
environment = "mainnet"
validator_id = module.validator.instance_id
vpc_id = module.vpc.vpc_id
alert_sns_arn = var.alert_sns_arn
}Step 8 – Drift Detection and Policy Enforcement
A terraform aws validator deployment isn’t complete without drift detection. Infrastructure drift changes made outside Terraform is a validator security risk. Someone logs in and manually opens a port. That change is invisible until something goes wrong.
Set up scheduled drift detection:
# .github/workflows/drift-detection.yaml
name: Terraform Drift Detection
on:
schedule:
- cron: '0 6 * * *' # Daily at 6am
jobs:
drift-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: "~1.6"
- name: Configure AWS credentials via OIDC
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.TERRAFORM_ROLE_ARN }}
aws-region: eu-west-1
- name: Terraform Init
run: terraform init
working-directory: environments/mainnet
- name: Terraform Plan detect drift
id: plan
run: |
terraform plan -detailed-exitcode -out=tfplan 2>&1
echo "exit_code=$?" >> $GITHUB_OUTPUT
working-directory: environments/mainnet
continue-on-error: true
- name: Alert on drift
if: steps.plan.outputs.exit_code == '2'
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "⚠️ Infrastructure drift detected in mainnet validator. Review the plan: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}Exit code 2 from terraform plan means changes were detected the live infrastructure differs from what Terraform expects. This fires a Slack alert before any human notices the discrepancy.
What to Build Next
With the core validator infrastructure in place, these are the logical next additions:
- Multi-region failover -> a standby validator in a second region with Route53 health check failover
- Validator key rotation automation -> a Lambda function triggered by a Secrets Manager rotation schedule
- CloudTrail integration -> every AWS API call against validator infrastructure logged and searchable
- Cosmovisor integration via user data -> automatic binary upgrades for Cosmos validators (see our Cosmos validator slashing guide for the security context)
For the official Terraform AWS Provider documentation and all supported resources, see the Terraform Registry.
Conclusion
A terraform aws validator deployment done properly is not just automation it’s a security architecture. VPC isolation, IAM least privilege, encrypted state, secrets separated from code, IMDSv2 enforcement, and drift detection working together create an environment where your validator infrastructure is auditable, reproducible, and resistant to the most common attack vectors.
The patterns in this guide are directly applicable to Ethereum, Cosmos, Substrate, and any EVM-compatible chain. The modules are designed to be reused across testnet and mainnet with variable overrides, not copy-pasted configurations.
If you need this infrastructure designed, deployed, and owned by engineers who have done it in production that’s exactly what we do at The Good Shell. See our Web3 infrastructure services or read our case studies to see production validator infrastructure in practice.
