Cosmos Validator Slashing: 7 Ways to Protect Your Node and Never Get Slashed Again

# Validator node config.toml
pex = false
persistent_peers = "sentry1_node_id@sentry1_private_ip:26656,sentry2_node_id@sentry2_private_ip:26656"
private_peer_ids = ""
addr_book_strict = false
# Sentry node config.toml
pex = true
persistent_peers = "validator_node_id@validator_private_ip:26656"
private_peer_ids = "validator_node_id"
unconditional_peer_ids = "validator_node_id"
# Install dependencies
sudo apt install build-essential pkg-config libusb-1.0-0-dev

# Install TMKMS
cargo install tmkms --features=softsign
tmkms init /etc/tmkms
# /etc/tmkms/tmkms.toml
[[chain]]
id = "cosmoshub-4"
key_format = { type = "bech32", account_key_prefix = "cosmospub", consensus_key_prefix = "cosmosvalconspub" }
state_file = "/etc/tmkms/state/cosmoshub-4-consensus.json"

[[providers.softsign]]
chain_ids = ["cosmoshub-4"]
path = "/etc/tmkms/secrets/cosmoshub-4-consensus.key"

[[validator]]
addr = "tcp://validator_private_ip:26659"
chain_id = "cosmoshub-4"
reconnect = true
priv_validator_laddr = "tcp://0.0.0.0:26659"
git clone https://github.com/strangelove-ventures/horcrux
cd horcrux
make install
# On each Horcrux node
horcrux config init \
  --node "tcp://validator_ip:1234" \
  --cosigner "tcp://horcrux1_ip:2222|1" \
  --cosigner "tcp://horcrux2_ip:2222|2" \
  --cosigner "tcp://horcrux3_ip:2222|3" \
  --threshold 2 \
  --grpc-timeout 1000ms \
  --raft-timeout 1000ms
# /etc/systemd/system/gaiad.service
[Unit]
Description=Cosmos Hub Node
After=network-online.target

[Service]
User=cosmos
ExecStart=/usr/local/bin/gaiad start --home /home/cosmos/.gaia
Restart=always
RestartSec=3
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target
# Add to crontab - alert if disk usage above 80%
*/5 * * * * df -h / | awk 'NR==2{if(int($5)>80) system("curl -s -X POST https://hooks.slack.com/YOUR_WEBHOOK -d \"{\\\"text\\\":\\\"ALERT: Disk usage at "$5" on validator\\\"}\"")}'
# prometheus-rules.yaml
groups:
  - name: validator.rules
    rules:
      - alert: ValidatorMissedBlocks
        expr: |
          increase(cosmos_validator_missed_blocks_total[10m]) > 10
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Validator missing blocks"
          description: "Validator has missed {{ $value }} blocks in the last 10 minutes."

      - alert: ValidatorJailRisk
        expr: |
          cosmos_validator_missed_blocks_total > 400
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "CRITICAL: Validator at risk of jailing"
          description: "Validator has missed {{ $value }} blocks - approaching jail threshold of 500."
kubectl apply -f prometheus-rules.yaml

# Install Cosmovisor
go install cosmossdk.io/tools/cosmovisor/cmd/cosmovisor@latest

# Set environment variables
export DAEMON_NAME=gaiad
export DAEMON_HOME=$HOME/.gaia
export DAEMON_ALLOW_DOWNLOAD_BINARIES=true
export DAEMON_RESTART_AFTER_UPGRADE=true

# Run with Cosmovisor instead of gaiad directly
cosmovisor run start
```

Cosmovisor watches for upgrade governance proposals, downloads the new binary when the upgrade height approaches, and swaps the binary automatically at the correct block height. This eliminates the most common source of upgrade-related downtime.

---

## Protection 7 - Incident Runbooks for Every Failure Mode

The final layer of cosmos validator slashing protection is operational: documented runbooks for every failure scenario. When an alert fires at 3am, you don't want to be figuring out the unjail command from memory.

**Minimum runbook set:**

**Runbook 1 - Validator jailed for downtime:**
```
1. SSH to validator node
2. Check gaiad process: systemctl status gaiad
3. If stopped: systemctl start gaiad
4. Wait for node to sync: gaiad status | jq .SyncInfo.catching_up
5. Once synced, unjail: gaiad tx slashing unjail --from validator-key --chain-id cosmoshub-4 --gas auto --fees 1000uatom
6. Verify back in active set: gaiad query tendermint-validator-set | grep YOUR_VALIDATOR_ADDRESS
```

**Runbook 2 - Disk space critical:**
```
1. SSH to validator node
2. Check disk usage: df -h
3. Find large files: du -sh /home/cosmos/.gaia/* | sort -rh | head -20
4. Prune old blocks if needed: gaiad tendermint unsafe-reset-all (WARNING: only if necessary)
5. Clear old logs: journalctl --vacuum-size=2G
```

**Runbook 3 - Sentry node offline:**
```
1. Check sentry node status from monitoring dashboard
2. SSH to sentry node
3. Check connectivity: gaiad status
4. If node not syncing, restart: systemctl restart gaiad
5. Verify validator still connected through remaining sentries

See the 7-day audit →

Leave a Reply

Your email address will not be published. Required fields are marked *