DevSecOps Pipeline: 30 Custom Checkov Rules, CDK-Nag, and SARIF Integration

TL;DR — Default IaC scanning catches the obvious. This pipeline enforces 30 custom Checkov rules across 11 domain-grouped Python files plus CDK-Nag against 4 compliance frameworks — with severity gating that warns in dev but blocks production on any CRITICAL or HIGH finding.


1. The "Solo-Preneur" Context

The Constraint

I manage 4 AWS projects with 12+ stacks across 3 environments. Each stack generates a CloudFormation template with hundreds of resources. Manual security review is impossible at this scale — and dangerous, because the patterns that matter most (IMDSv2 enforcement, SSH port 22 blocks, EBS encryption mandates) are exactly the ones humans forget when they are making "quick fixes" at 11 PM.

What I Needed

  • Automated security scanning on every CI run, with custom rules that go beyond generic CIS benchmarks
  • Non-blocking warnings in dev to keep iteration fast, but pipeline failures for any CRITICAL or HIGH finding in production
  • CDK-Nag compliance validation during synthesis as a separate check from Checkov
  • SARIF integration with the GitHub Security tab for historical tracking (no third-party SaaS)
  • Every suppression — whether global Checkov or per-resource CDK-Nag — must include a documented reason, making the skip itself an auditable artifact in Git

Screenshot: GitHub Actions security scan step summary

Showing passed/failed checks by severity with environment-specific blocking status


2. Architecture: Three-Layer Security Model

Hover to zoom

Why Three Layers?

LayerWhenWhatTool
Synthesiscdk synthConstruct-level validationCDK-Nag
Pre-DeployCI pipelineTemplate-level policy enforcementCheckov
Deploycdk deployRuntime provenance + taggingCDK Aspects

Each layer catches different things. CDK-Nag validates construct intent (e.g., "this S3 bucket should have encryption"). Checkov validates template output (e.g., "this CloudFormation resource has encryption property set"). Provenance tags validate deployment lineage (e.g., "which commit deployed this stack"). The layers don't talk to each other — a CDK-Nag suppression does not suppress the same issue in Checkov. You have to address the finding at both levels or document why you're skipping it at each.


3. Decision Log: Why This Security Architecture

Checkov Over tfsec, cfn-guard, and OPA

Checkov was chosen for three reasons: (1) it's the only open-source scanner supporting custom CloudFormation checks in Python with full template access — cfn-guard uses its own DSL; (2) native SARIF output integrates directly with GitHub's Security tab; (3) external-checks-dir auto-discovers new Python files — adding a check is a single file addition with zero config changes.

CDK-Nag Over Manual Construct Review

CDK-Nag provides synthesis-time validation — a non-compliant construct fails cdk synth immediately, before templates are generated. The 4 compliance packs (AWS Solutions, HIPAA, NIST 800-53, PCI DSS) are drop-in Aspects with required reason strings for suppressions.

Severity Gating Over Binary Pass/Fail

A binary approach was rejected. In dev, security findings shouldn't block iteration — developers see warnings and address them before promoting. In production, only CRITICAL and HIGH block the pipeline. The same Checkov config works for both environments, with CI overriding the soft-fail-on input per environment.

Inline CFN Metadata Over Global Skip Lists

Global skip lists suppress checks across all resources — dangerous for application resources. Inline metadata (cfnInstance.addMetadata('checkov', {...})) scopes the suppression to a single CloudFormation resource with a required comment field. Every suppression is version-controlled and auditable via git grep addMetadata.


4. Implementation

4.1: 30 Custom Checkov Rules Across 11 Domain-Grouped Files

The .checkov/custom_checks/ directory contains 30 Python check classes consolidated into 11 domain-grouped files. Related checks share helpers and live together — for example, sg_rules.py contains all 5 Security Group checks with shared _has_external_cidr() and _parse_ports() helpers, and compute_rules.py contains 3 UserData checks with a single shared _extract_userdata_strings() function (previously copy-pasted across 3 separate files):

DomainFileChecksExample
Security Groupssg_rules.py5No SSH, restricted egress, no Grafana CIDR
IAMiam_rules.py5Permissions boundary, no static names
Logginglogging_rules.py3KMS encryption, retention, DeletionPolicy
EC2 UserDatacompute_rules.py3No hardcoded creds, IMDSv2, Docker ports
EBS Volumesebs_rules.py3CMK encryption, min size, backup strategy
KMSkms_rules.py2No kms:* wildcard, DeletionPolicy Retain
ASGasg_rules.py2ELB health check, MinSize ≥ 2
VPCvpc_rules.py2No auto-public IPs, endpoint policies
Lambdalambda_rules.py2Reserved concurrency, DLQ configured
SNSsns_rules.py2KMS encryption, SSL enforcement
SQSsqs_ssl_enabled.py1SSL enforcement via queue policy

Example: No SSH Rule (CKV_CUSTOM_SG_1)

This rule enforces the SSM-only access model — no Security Group should ever allow port 22. The rationale: EC2 instances use AWS Systems Manager Session Manager for shell access, which requires no inbound ports, provides IAM-based access control, and logs every session to CloudWatch. Even a /32 SSH rule is problematic because IPs change, key management becomes an operational burden, and SSH bypasses IAM entirely:

# .checkov/custom_checks/sg_rules.py — one of 5 SG checks in this file
class SecurityGroupNoSSH(BaseResourceCheck):
    def __init__(self):
        name = "Ensure security groups do not allow SSH ingress (use SSM Session Manager)"
        id = "CKV_CUSTOM_SG_1"
        supported_resources = ["AWS::EC2::SecurityGroup"]
        categories = [CheckCategories.NETWORKING]

    def scan_resource_conf(self, conf):
        properties = conf.get("Properties", {})
        for rule in properties.get("SecurityGroupIngress", []):
            from_port, to_port = _parse_ports(rule)  # shared helper

            # Port 22 falls within the range
            if from_port is not None and from_port <= 22 <= to_port:
                return CheckResult.FAILED

        return CheckResult.PASSED

Example: IMDSv2 UserData Validation (CKV_CUSTOM_COMPUTE_2)

This catches a critical runtime bug: LaunchTemplates enforce HttpTokens: required (IMDSv2), but UserData scripts using bare curl to the metadata endpoint get HTTP 401. The check recursively parses CloudFormation UserData through Fn::Base64, Fn::Sub, and Fn::Join intrinsic functions to extract the actual shell script, then validates that every IMDS curl call includes the IMDSv2 token header:

# .checkov/custom_checks/compute_rules.py — shared helper + 3 checks

# Shared helper (was copy-pasted in 3 files, now defined once)
def _extract_userdata_strings(userdata) -> list[str]:
    """Recursively extract string literals from CloudFormation UserData."""
    if isinstance(userdata, str):
        return [userdata]
    strings: list[str] = []
    if isinstance(userdata, dict):
        for key, value in userdata.items():
            if key in ("Fn::Base64", "Fn::Sub"):
                strings.extend(_extract_userdata_strings(value))
            elif key == "Fn::Join":
                # ...
    return strings

# IMDSv2 check uses the shared helper
IMDSV1_PATTERN = re.compile(
    r"curl\s+(?:(?!-H\s+[\"']X-aws-ec2-metadata-token).)*"
    r"http://169\.254\.169\.254/latest/meta-data",
    re.DOTALL,
)

class UserDataIMDSv2Required(BaseResourceCheck):
    def scan_resource_conf(self, conf):
        full_script = _get_userdata_script(conf)  # uses shared helper
        if not full_script:
            return CheckResult.PASSED

        has_imdsv1_calls = bool(IMDSV1_PATTERN.search(full_script))
        has_imdsv2_token = bool(IMDSV2_TOKEN_PATTERN.search(full_script))

        if has_imdsv1_calls and not has_imdsv2_token:
            return CheckResult.FAILED
        return CheckResult.PASSED

Terminal: Checkov scan results with custom checks

Showing CKV_CUSTOM_SG_1 and CKV_CUSTOM_COMPUTE_2 check names passing against CloudFormation templates

4.2: Environment-Specific Severity Gating

The Checkov config implements a graduated security posture. The base configuration in .checkov/config.yaml sets soft-fail-on: [LOW, MEDIUM], which means development and staging environments see findings as non-blocking warnings. The CI workflow overrides this for production by setting enforce-blocking: true, which causes any CRITICAL or HIGH finding to fail the pipeline:

# .checkov/config.yaml
framework: cloudformation

# Custom checks directory — auto-discovers all .py files
external-checks-dir:
  - custom_checks

# Development: LOW and MEDIUM don't block
soft-fail-on:
  - LOW
  - MEDIUM

# Documented skip rules (CDK-managed exceptions)
skip-check:
  - CKV_AWS_117 # Lambda VPC — CDK custom resources don't need VPC
  - CKV_AWS_116 # Lambda DLQ — CDK custom resources handle retries
  - CKV_AWS_115 # Lambda concurrency — deployment-only functions

The CI workflow overrides this for production deployments:

# _deploy-nextjs.yml — production overrides
security-scan:
  uses: ./.github/workflows/_iac-security-scan.yml
  with:
    environment: production
    enforce-blocking: true # CRITICAL/HIGH → fail the pipeline
    security-scan-blocking: true
Hover to zoom

4.3: CDK-Nag — Synthesis-Time Compliance

CDK-Nag validates constructs during cdk synth, catching issues before templates are even generated. The implementation supports 4 compliance frameworks through a clean enum-based configuration. As of February 2026, CDK-Nag provides 4 rule packs: AWS Solutions (general best practices, enabled by default), HIPAA Security (healthcare compliance), NIST 800-53 R5 (federal security), and PCI DSS 3.2.1 (payment card security):

// lib/aspects/cdk-nag-aspect.ts
export enum CompliancePack {
  AWS_SOLUTIONS = "AwsSolutions", // General best practices
  HIPAA = "HIPAA", // Healthcare compliance
  NIST_800_53 = "NIST800-53", // Federal security
  PCI_DSS = "PCI-DSS", // Payment card security
}

export function applyCdkNag(
  scope: IConstruct,
  config: CdkNagConfig = {},
): void {
  const { packs, verbose, reports } = { ...DEFAULT_CONFIG, ...config };

  for (const pack of packs) {
    switch (pack) {
      case CompliancePack.AWS_SOLUTIONS:
        Aspects.of(scope).add(new AwsSolutionsChecks({ verbose, reports }));
        break;
      case CompliancePack.HIPAA:
        Aspects.of(scope).add(new HIPAASecurityChecks({ verbose, reports }));
        break;
      // ... NIST, PCI DSS
    }
  }
}

Common suppressions are centralized in COMMON_SUPPRESSIONS with documented reasons. The project has 7 common suppressions, each explaining why the finding is acceptable. Zero suppressions are allowed without a reason string — CDK-Nag enforces this at compile time:

// lib/aspects/cdk-nag-aspect.ts — 7 documented suppressions
export const COMMON_SUPPRESSIONS: NagPackSuppression[] = [
  {
    id: "AwsSolutions-EC23",
    reason:
      "Security group allows ingress from specific trusted CIDRs only, not 0.0.0.0/0",
  },
  {
    id: "AwsSolutions-IAM4",
    reason:
      "AWS managed policies used for SSM and CloudWatch — standard for EC2 monitoring",
  },
  {
    id: "AwsSolutions-IAM5",
    reason:
      "Wildcard permissions required for CloudWatch Logs and SSM document execution",
  },
];
  1. 1

    Add CDK-Nag Aspect

    Call `applyCdkNag(scope, { packs: [CompliancePack.AWS_SOLUTIONS] })` in the app entry point to enable synthesis-time validation.

  2. 2

    Run cdk synth

    Non-compliant constructs fail synthesis immediately with a clear message identifying the violated rule and suggested fix.

  3. 3

    Suppress or Fix

    Either fix the construct or add inline suppression with a required `reason` string — CDK-Nag enforces this at compile time.

4.4: EnforceReadOnlyDynamoDbAspect — Domain-Specific Governance

Beyond generic compliance, the project implements a custom CDK Aspect that enforces a domain-specific security invariant: ECS task roles must never have DynamoDB write permissions. The Next.js application reads directly from DynamoDB via the task role, but writes must go through the API Gateway → Lambda path for audit logging and rate limiting. The aspect inspects IAM CfnPolicy L1 constructs, resolves { Ref: 'LogicalId' } tokens to identify task role policies, and fails synthesis if any of 8 forbidden DynamoDB actions are found:

// lib/aspects/enforce-readonly-dynamodb-aspect.ts
export const DYNAMODB_WRITE_ACTIONS: readonly string[] = [
  "dynamodb:PutItem",
  "dynamodb:DeleteItem",
  "dynamodb:UpdateItem",
  "dynamodb:BatchWriteItem",
] as const;

export const DYNAMODB_ADMIN_ACTIONS: readonly string[] = [
  "dynamodb:CreateTable",
  "dynamodb:DeleteTable",
  "dynamodb:UpdateTable",
  "dynamodb:CreateGlobalTable",
] as const;

export class EnforceReadOnlyDynamoDbAspect implements cdk.IAspect {
  visit(node: IConstruct): void {
    if (!(node instanceof iam.CfnPolicy)) return;

    const resolved = cdk.Stack.of(node).resolve(node.roles);
    const isTaskRolePolicy = resolved.some((role) => {
      const roleId =
        typeof role === "string"
          ? role
          : ((role as { Ref?: string })?.Ref ?? "");
      return roleId.toLowerCase().includes(this.roleNamePattern);
    });

    if (isTaskRolePolicy) {
      this.inspectPolicyDocument(node, node.policyDocument);
    }
  }
}

This catches drift before deployment. If someone accidentally grants dynamodb:PutItem to the task role (perhaps by using table.grantReadWriteData() instead of table.grantReadData()), CDK synthesis fails with a clear error message — not a production incident where the ECS task could bypass the write audit path.

4.5: TaggingAspect — Resource Governance

Every taggable resource gets consistent governance tags via a CDK Aspect. The TaggingAspect applies 5 tags (Environment, Project, Owner, ManagedBy, and optionally CostCenter) to every resource in the stack. Combined with the SLSA provenance tags from the CI/CD pipeline (DeployCommit, DeployRunId, DeployActor), every CloudFormation resource has both governance metadata and deployment lineage:

// lib/aspects/tagging-aspect.ts
export class TaggingAspect implements cdk.IAspect {
  constructor(config: TagConfig) {
    this.tags = {
      Environment: config.environment,
      Project: config.project,
      Owner: config.owner,
      ManagedBy: "CDK",
      ...(config.costCenter && { CostCenter: config.costCenter }),
    };
  }

  public visit(node: IConstruct): void {
    if (cdk.TagManager.isTaggable(node)) {
      Object.entries(this.tags).forEach(([key, value]) => {
        node.tags.setTag(key, value);
      });
    }
  }
}

4.6: SARIF Integration — GitHub Security Tab

SARIF → GitHub Security Tab

Checkov scan results are uploaded as SARIF files to GitHub's Security tab, providing historical tracking across deployments, per-environment categorization (dev vs staging vs production), and PR code annotations. The 30-day artifact retention creates a compliance audit trail without any third-party SaaS dependency.

# _iac-security-scan.yml
- name: Upload SARIF to GitHub Security
  if: always() && hashFiles('security-reports/results_sarif.sarif') != ''
  uses: github/codeql-action/upload-sarif@b5ebac6f4c00c8c...
  with:
    sarif_file: security-reports/results_sarif.sarif
    category: checkov-${{ inputs.environment }}

The category: checkov-${{ inputs.environment }} tag is the key — it allows the Security tab to show separate finding trends for development, staging, and production, making it easy to identify environment-specific regressions.

Screenshot: GitHub Security tab — SARIF results

Showing Checkov findings with environment-specific categories and trend graphs over time

4.7: Inline Metadata Suppressions

Scoped Suppressions via Inline Metadata

For resource-specific exceptions that cannot use global skip lists, CDK uses inline CloudFormation metadata. The comment field is required — Checkov reads these blocks and skips the check for that specific resource only. Every suppression is version-controlled and auditable via git grep addMetadata.

// Example: Monitoring EC2 instance — dev default password isn't a real secret
const cfnInstance = this.instance.node.defaultChild as cdk.CfnResource;
cfnInstance.addMetadata("checkov", {
  skip: [
    {
      id: "CKV_AWS_46",
      comment:
        "GrafanaPassword in user data is a non-sensitive dev default (admin) " +
        "— real secrets use SSM SecureString",
    },
  ],
});

This pattern means that every suppression is version-controlled, auditable, and scoped. A reviewer can search the codebase for addMetadata("checkov" to find every exception and verify that each has a valid reason.


5. The "Oh No" Moment: IMDSv2 vs UserData

The Problem

The LaunchTemplate enforced HttpTokens: required (IMDSv2), but the UserData script used plain curl to the metadata endpoint. The result: every metadata call returned HTTP 401, causing the EBS volume attachment to fail silently. The instance booted, but without the data volume, Prometheus had no persistent storage — and no data meant no dashboards.

Hover to zoom

The Fix

Created CKV_CUSTOM_COMPUTE_2 to catch this at scan time. The check parses UserData through CloudFormation intrinsic functions (Fn::Base64, Fn::Sub, Fn::Join) and flags any IMDS curl call without an IMDSv2 token header. This turned a runtime failure (empty variables → failed EBS attach → missing Prometheus data) into a pre-deploy failure (Checkov blocks the pipeline with a clear message identifying the offending UserData line).

The Lesson: Runtime Behavior Matters

Infrastructure security checks must understand the runtime behavior of UserData scripts, not just the CloudFormation properties that declare them. Wrapping IMDSv2 enforcement in a LaunchTemplate property is necessary but not sufficient — the UserData script that runs inside the instance must also use the correct token-based metadata API.


6. Security Scan Workflow Architecture

The _iac-security-scan.yml reusable workflow handles the full scan lifecycle. It accepts environment, enforcement mode, and soft-fail configuration as inputs, downloads the CDK synthesis artifacts from the parent workflow, runs Checkov with custom config, parses severity counts from the JSON output, applies environment-specific blocking policy, and produces 3 report types:

Hover to zoom

The workflow outputs structured results for downstream consumption — the parent deployment workflow can read scan-passed, findings-count, critical-count, and high-count to decide whether to proceed with deployment or abort:

# _iac-security-scan.yml — structured outputs
outputs:
  scan-passed: ${{ jobs.checkov-scan.outputs.scan-passed }}
  findings-count: ${{ jobs.checkov-scan.outputs.findings-count }}
  critical-count: ${{ jobs.checkov-scan.outputs.critical-count }}
  high-count: ${{ jobs.checkov-scan.outputs.high-count }}

7. FinOps & Maintenance Impact

The entire security pipeline runs on open-source tooling at zero incremental cost. Checkov, CDK-Nag, GitHub Actions (within free-tier limits), and SARIF uploads are all free for public repositories.

Adding a new Checkov check means adding a class to the appropriate domain-grouped file in .checkov/custom_checks/ — Checkov auto-discovers it, so there are no config changes. Adding a CDK-Nag suppression requires inline metadata or an entry in COMMON_SUPPRESSIONS with a mandatory reason string. Enabling an additional compliance pack is a one-line change to the applyCdkNag() call.


8. What Needs Work — and What's Next

What's Working

PatternStatusImpact
30 Custom Checkov Checks (11 files)✅ ShippedProject-specific policies catch real bugs
Severity Gating (dev/prod)✅ ShippedDevelopment stays fast, production is secure
SARIF → GitHub Security Tab✅ ShippedHistorical tracking without SaaS
Documented Suppressions (inline metadata)✅ ShippedEvery skip has a reason, auditable in Git
CDK-Nag (4 compliance packs)✅ ShippedSynthesis-time construct validation
EnforceReadOnlyDynamoDb Aspect✅ ShippedDomain-specific governance for ECS task roles
TaggingAspect + SLSA Provenance✅ ShippedGovernance metadata + deployment lineage

Remaining Gaps and Roadmap

ImprovementEffortImpactStatus
Unit tests for custom Checkov rules1 dayPrevent false positives from bad regexPlanned
Enable NIST 800-53 pack by default1 hourFull federal compliance validationPlanned
Continuous scanning via AWS Config2 daysDetect runtime drift from deployed stateEvaluating
OPA/Rego policies for cross-cloud portability3 daysMulti-cloud policy enforcementResearching
Secret scanning integration with Checkov3 hoursUnified scanning in one pipeline stepPlanned

The custom Checkov rules are the most impactful investment — particularly CKV_CUSTOM_COMPUTE_2 (IMDSv2 UserData), which caught a real runtime failure during development that would have been invisible until dashboards went blank. The recent consolidation from 26 single-purpose files into 11 domain-grouped files eliminated triple-duplicated helper code and removed 3 checks that produced no useful signal (one always returned PASSED, one false-positived on every IAM policy, one used an indirect proxy instead of checking what it claimed). The next priority is adding pytest fixtures with sample CloudFormation templates so that each check has regression tests — a regex change in the IMDSv2 pattern could silently break detection. Enabling the NIST 800-53 pack is the lowest-effort improvement with the highest compliance return. Longer-term, continuous scanning via AWS Config would close the gap between synthesis-time validation and runtime reality — catching manual console changes that bypass the CDK pipeline entirely.

In practice, the three-layer setup means I don't worry about security regressions between deploys. CDK-Nag catches bad constructs at synthesis, Checkov catches bad templates before they deploy, and provenance tags tell me exactly which commit changed what. The 30 custom checks are where the real value is — they encode this project's specific opinions (no SSH, IMDSv2 in UserData, read-only DynamoDB for task roles) into code that runs on every push. The severity gating keeps dev fast while production stays locked down. And every skip has a reason attached, which means the audit trail lives in Git, not in someone's head.


9. Related Files

FileDescription
.checkov/config.yamlCheckov configuration (framework, skips, soft-fail)
.checkov/custom_checks/*.py30 custom Python security checks (11 domain-grouped files)
.github/workflows/_iac-security-scan.ymlReusable security scan workflow
lib/aspects/cdk-nag-aspect.tsCDK-Nag 4-pack compliance
lib/aspects/tagging-aspect.tsResource tagging governance
lib/aspects/enforce-readonly-dynamodb-aspect.tsDynamoDB read-only IAM enforcement
lib/aspects/index.tsAspects barrel export

10. Tech Stack Summary

CategoryTechnology
IaC ScannerCheckov (CloudFormation framework)
Custom Rules30 Python checks across 11 domain-grouped files (SG, IAM, logging, compute, EBS, VPC, Lambda, KMS, ASG, SNS, SQS)
ComplianceCDK-Nag (AWS Solutions, HIPAA, NIST 800-53, PCI DSS)
Resource GovernanceCDK Aspects (TaggingAspect, EnforceReadOnlyDynamoDb)
CI IntegrationGitHub Actions reusable workflow
ReportingSARIF → GitHub Security, JSON artifacts (30-day)
Severity PolicySoft-fail (dev), blocking (prod) for CRITICAL/HIGH
ProvenanceSLSA-inspired CloudFormation tags
SuppressionsInline CFN metadata with required comment field
DevSecOps Pipeline: 30 Custom Checkov Rules, CDK-Nag, and SARIF Integration - Nelson Lamounier