Deployment

Comprehensive deployment procedures and CI/CD workflows for SOLVEFORCE projects, ensuring reliable, secure, and automated delivery of applications and documentation.

🚀 Deployment Philosophy

🎯 Core Principles

Automation First:

Automated pipelines for consistent deployments
Infrastructure as Code (IaC) for reproducible environments
Automated testing at every stage
Zero-downtime deployment strategies
Rollback capabilities for quick recovery

Security & Compliance:

Secure deployment pipelines with secrets management
Compliance checks and security scanning
Access controls and audit trails
Environment isolation and protection
Regular security updates and patches

Reliability & Performance:

Blue-green and canary deployment strategies
Health checks and monitoring integration
Performance validation during deployment
Automated smoke testing post-deployment
Comprehensive logging and observability

🏗️ CI/CD Pipeline Architecture

🔄 Pipeline Overview

graph LR
    A[Code Push] --> B[Build]
    B --> C[Test]
    C --> D[Security Scan]
    D --> E[Deploy to Staging]
    E --> F[Integration Tests]
    F --> G[Deploy to Production]
    G --> H[Post-deployment Tests]
    H --> I[Monitor & Alert]

Pipeline Stages:

Source Control: Git-based version control
Build: Compile, package, and prepare artifacts
Test: Unit, integration, and security testing
Deploy: Automated deployment to environments
Validate: Post-deployment testing and verification
Monitor: Ongoing health and performance monitoring

🛠️ GitHub Actions Workflows

Main Deployment Workflow:

# .github/workflows/deploy.yml
name: Deploy to Production

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

env:
  NODE_VERSION: '18'
  PYTHON_VERSION: '3.10'

jobs:
  build:
    runs-on: ubuntu-latest
    
    outputs:
      version: ${{ steps.version.outputs.version }}
      
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
      with:
        fetch-depth: 0
    
    - name: Generate version
      id: version
      run: |
        VERSION=$(date +'%Y.%m.%d')-${GITHUB_SHA::8}
        echo "version=$VERSION" >> $GITHUB_OUTPUT
        echo "Generated version: $VERSION"
    
    - name: Setup Node.js
      uses: actions/setup-node@v3
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Install dependencies
      run: |
        npm ci
        npm install -g markdownlint-cli
    
    - name: Setup mdBook
      uses: peaceiris/actions-mdbook@v1
      with:
        mdbook-version: '0.4.40'
    
    - name: Lint documentation
      run: |
        markdownlint src/**/*.md
    
    - name: Build documentation
      run: |
        mdbook build
    
    - name: Upload build artifacts
      uses: actions/upload-artifact@v3
      with:
        name: documentation-${{ steps.version.outputs.version }}
        path: book/
        retention-days: 30

  test:
    runs-on: ubuntu-latest
    needs: build
    
    strategy:
      matrix:
        test-type: [unit, integration, e2e]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Setup Python
      uses: actions/setup-python@v3
      with:
        python-version: ${{ env.PYTHON_VERSION }}
    
    - name: Install test dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements-test.txt
    
    - name: Run ${{ matrix.test-type }} tests
      run: |
        pytest tests/${{ matrix.test-type }}/ --junitxml=test-results-${{ matrix.test-type }}.xml
    
    - name: Upload test results
      uses: actions/upload-artifact@v3
      if: always()
      with:
        name: test-results-${{ matrix.test-type }}
        path: test-results-${{ matrix.test-type }}.xml

  security-scan:
    runs-on: ubuntu-latest
    needs: build
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v3
    
    - name: Run Trivy vulnerability scanner
      uses: aquasecurity/trivy-action@master
      with:
        scan-type: 'fs'
        scan-ref: '.'
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload security scan results
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'
    
    - name: Run dependency check
      run: |
        pip install safety
        safety check --json --output safety-report.json || true
    
    - name: Upload safety report
      uses: actions/upload-artifact@v3
      with:
        name: security-reports
        path: safety-report.json

  deploy-staging:
    runs-on: ubuntu-latest
    needs: [build, test, security-scan]
    if: github.ref == 'refs/heads/main'
    
    environment:
      name: staging
      url: https://staging-docs.solveforce.com
    
    steps:
    - name: Download build artifacts
      uses: actions/download-artifact@v3
      with:
        name: documentation-${{ needs.build.outputs.version }}
        path: ./book
    
    - name: Deploy to staging
      run: |
        # Deploy to staging environment
        aws s3 sync ./book s3://staging-docs-solveforce-com --delete
        aws cloudfront create-invalidation --distribution-id ${{ secrets.STAGING_CLOUDFRONT_ID }} --paths "/*"
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        AWS_DEFAULT_REGION: us-east-1
    
    - name: Run staging smoke tests
      run: |
        curl -f https://staging-docs.solveforce.com/ || exit 1
        curl -f https://staging-docs.solveforce.com/technology/connectivity/ || exit 1

  deploy-production:
    runs-on: ubuntu-latest
    needs: [build, deploy-staging]
    if: github.ref == 'refs/heads/main'
    
    environment:
      name: production
      url: https://docs.solveforce.com
    
    steps:
    - name: Download build artifacts
      uses: actions/download-artifact@v3
      with:
        name: documentation-${{ needs.build.outputs.version }}
        path: ./book
    
    - name: Deploy to production
      run: |
        # Blue-green deployment to production
        aws s3 sync ./book s3://docs-solveforce-com --delete
        aws cloudfront create-invalidation --distribution-id ${{ secrets.PROD_CLOUDFRONT_ID }} --paths "/*"
      env:
        AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
        AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
        AWS_DEFAULT_REGION: us-east-1
    
    - name: Run production health checks
      run: |
        curl -f https://docs.solveforce.com/ || exit 1
        curl -f https://docs.solveforce.com/technology/connectivity/ || exit 1
        curl -f https://docs.solveforce.com/api/overview/ || exit 1
    
    - name: Notify deployment success
      uses: 8398a7/action-slack@v3
      with:
        status: success
        text: 'Documentation successfully deployed to production'
      env:
        SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}

  post-deployment:
    runs-on: ubuntu-latest
    needs: deploy-production
    
    steps:
    - name: Run post-deployment tests
      run: |
        # Run comprehensive post-deployment validation
        python scripts/validate_deployment.py --environment production
    
    - name: Update deployment metrics
      run: |
        # Update deployment dashboard
        curl -X POST "https://metrics.solveforce.com/deployments" \
          -H "Authorization: Bearer ${{ secrets.METRICS_API_TOKEN }}" \
          -d '{
            "service": "documentation",
            "version": "${{ needs.build.outputs.version }}",
            "environment": "production",
            "status": "success",
            "timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
          }'

🔐 Secrets Management

GitHub Secrets Configuration:

# Required secrets in GitHub repository settings
AWS_ACCESS_KEY_ID: # AWS IAM access key for deployment
AWS_SECRET_ACCESS_KEY: # AWS IAM secret key for deployment
STAGING_CLOUDFRONT_ID: # CloudFront distribution ID for staging
PROD_CLOUDFRONT_ID: # CloudFront distribution ID for production
SLACK_WEBHOOK_URL: # Slack webhook for notifications
METRICS_API_TOKEN: # API token for deployment metrics

Environment Variables:

# .env.production
NODE_ENV=production
API_BASE_URL=https://api.solveforce.com
CDN_URL=https://cdn.solveforce.com
MONITORING_ENDPOINT=https://monitoring.solveforce.com

# .env.staging
NODE_ENV=staging
API_BASE_URL=https://staging-api.solveforce.com
CDN_URL=https://staging-cdn.solveforce.com
MONITORING_ENDPOINT=https://staging-monitoring.solveforce.com

🌍 Infrastructure as Code

☁️ AWS Infrastructure

Terraform Configuration:

# infrastructure/main.tf
terraform {
  required_version = ">= 1.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  
  backend "s3" {
    bucket = "solveforce-terraform-state"
    key    = "documentation/terraform.tfstate"
    region = "us-east-1"
  }
}

provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Project     = "SolveForce Documentation"
      Environment = var.environment
      ManagedBy   = "Terraform"
    }
  }
}

# S3 bucket for static site hosting
resource "aws_s3_bucket" "docs_bucket" {
  bucket = "${var.environment}-docs-solveforce-com"
}

resource "aws_s3_bucket_public_access_block" "docs_bucket_pab" {
  bucket = aws_s3_bucket.docs_bucket.id

  block_public_acls       = false
  block_public_policy     = false
  ignore_public_acls      = false
  restrict_public_buckets = false
}

resource "aws_s3_bucket_website_configuration" "docs_bucket_website" {
  bucket = aws_s3_bucket.docs_bucket.id

  index_document {
    suffix = "index.html"
  }

  error_document {
    key = "404.html"
  }
}

# CloudFront distribution
resource "aws_cloudfront_distribution" "docs_distribution" {
  origin {
    domain_name = aws_s3_bucket.docs_bucket.bucket_regional_domain_name
    origin_id   = "S3-${aws_s3_bucket.docs_bucket.id}"

    s3_origin_config {
      origin_access_identity = aws_cloudfront_origin_access_identity.docs_oai.cloudfront_access_identity_path
    }
  }

  enabled             = true
  is_ipv6_enabled     = true
  default_root_object = "index.html"

  aliases = var.environment == "production" ? ["docs.solveforce.com"] : ["staging-docs.solveforce.com"]

  default_cache_behavior {
    allowed_methods        = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
    cached_methods         = ["GET", "HEAD"]
    target_origin_id       = "S3-${aws_s3_bucket.docs_bucket.id}"
    compress               = true
    viewer_protocol_policy = "redirect-to-https"

    forwarded_values {
      query_string = false
      cookies {
        forward = "none"
      }
    }

    min_ttl     = 0
    default_ttl = 3600
    max_ttl     = 86400
  }

  restrictions {
    geo_restriction {
      restriction_type = "none"
    }
  }

  viewer_certificate {
    acm_certificate_arn      = var.ssl_certificate_arn
    ssl_support_method       = "sni-only"
    minimum_protocol_version = "TLSv1.2_2021"
  }

  custom_error_response {
    error_code         = 404
    response_code      = 404
    response_page_path = "/404.html"
  }

  tags = {
    Name = "${var.environment}-docs-distribution"
  }
}

# Variables
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
}

variable "ssl_certificate_arn" {
  description = "SSL certificate ARN for CloudFront"
  type        = string
}

# Outputs
output "cloudfront_distribution_id" {
  value = aws_cloudfront_distribution.docs_distribution.id
}

output "s3_bucket_name" {
  value = aws_s3_bucket.docs_bucket.id
}

output "website_url" {
  value = "https://${aws_cloudfront_distribution.docs_distribution.domain_name}"
}

Deployment Script:

#!/bin/bash
# scripts/deploy-infrastructure.sh

set -e

ENVIRONMENT=${1:-staging}
AWS_REGION=${2:-us-east-1}

echo "Deploying infrastructure for environment: $ENVIRONMENT"

# Initialize Terraform
cd infrastructure
terraform init

# Validate configuration
terraform validate

# Plan deployment
terraform plan \
  -var="environment=$ENVIRONMENT" \
  -var="aws_region=$AWS_REGION" \
  -out=tfplan

# Apply if approved
read -p "Apply this plan? (y/N): " confirm
if [[ $confirm == [yY] || $confirm == [yY][eE][sS] ]]; then
  terraform apply tfplan
  echo "Infrastructure deployment completed successfully!"
else
  echo "Deployment cancelled."
  exit 1
fi

🐳 Containerized Applications

Docker Configuration:

# Dockerfile
FROM node:18-alpine AS builder

WORKDIR /app

# Install mdBook
RUN wget https://github.com/rust-lang/mdBook/releases/download/v0.4.40/mdbook-v0.4.40-x86_64-unknown-linux-musl.tar.gz \
    && tar -xzf mdbook-v0.4.40-x86_64-unknown-linux-musl.tar.gz \
    && chmod +x mdbook \
    && mv mdbook /usr/local/bin/

# Copy source files
COPY src/ ./src/
COPY book.toml ./

# Build documentation
RUN mdbook build

# Production stage
FROM nginx:alpine

# Copy built documentation
COPY --from=builder /app/book /usr/share/nginx/html

# Copy nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost/ || exit 1

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

Docker Compose for Local Development:

# docker-compose.yml
version: '3.8'

services:
  docs:
    build: .
    ports:
      - "3000:80"
    volumes:
      - ./src:/app/src:ro
      - ./book.toml:/app/book.toml:ro
    environment:
      - NODE_ENV=development
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  docs-dev:
    image: rust:1.70
    working_dir: /app
    volumes:
      - .:/app
    ports:
      - "3001:3001"
    command: |
      bash -c "
        cargo install mdbook &&
        mdbook serve --hostname 0.0.0.0 --port 3001
      "

networks:
  default:
    name: solveforce-docs

⚙️ Kubernetes Deployment

Kubernetes Manifests:

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: solveforce-docs
  namespace: production
  labels:
    app: solveforce-docs
    version: v1.0.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: solveforce-docs
  template:
    metadata:
      labels:
        app: solveforce-docs
        version: v1.0.0
    spec:
      containers:
      - name: docs
        image: solveforce/docs:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            memory: "64Mi"
            cpu: "100m"
          limits:
            memory: "128Mi"
            cpu: "200m"
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
        env:
        - name: NODE_ENV
          value: "production"
---
apiVersion: v1
kind: Service
metadata:
  name: solveforce-docs-service
  namespace: production
spec:
  selector:
    app: solveforce-docs
  ports:
  - protocol: TCP
    port: 80
    targetPort: 80
  type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: solveforce-docs-ingress
  namespace: production
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
  - hosts:
    - docs.solveforce.com
    secretName: solveforce-docs-tls
  rules:
  - host: docs.solveforce.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: solveforce-docs-service
            port:
              number: 80

🚦 Deployment Strategies

🔵 Blue-Green Deployment

Implementation Script:

#!/bin/bash
# scripts/blue-green-deploy.sh

set -e

ENVIRONMENT=${1:-production}
VERSION=${2:-latest}
HEALTH_CHECK_URL=${3:-https://docs.solveforce.com}

echo "Starting blue-green deployment for $ENVIRONMENT"

# Determine current and new slots
CURRENT_SLOT=$(aws elbv2 describe-target-groups --names "$ENVIRONMENT-docs-blue" --query 'TargetGroups[0].HealthyHostCount.N' --output text)
NEW_SLOT="green"

if [ "$CURRENT_SLOT" == "blue" ]; then
    NEW_SLOT="blue"
    OLD_SLOT="green"
else
    NEW_SLOT="green"
    OLD_SLOT="blue"
fi

echo "Deploying to $NEW_SLOT slot"

# Deploy to new slot
kubectl set image deployment/solveforce-docs-$NEW_SLOT docs=solveforce/docs:$VERSION -n $ENVIRONMENT

# Wait for deployment to be ready
kubectl rollout status deployment/solveforce-docs-$NEW_SLOT -n $ENVIRONMENT --timeout=300s

# Run health checks
echo "Running health checks on $NEW_SLOT slot"
for i in {1..10}; do
  if curl -f "$HEALTH_CHECK_URL" > /dev/null 2>&1; then
    echo "Health check passed"
    break
  else
    echo "Health check failed, attempt $i/10"
    sleep 30
  fi
  
  if [ $i -eq 10 ]; then
    echo "Health checks failed, rolling back"
    exit 1
  fi
done

# Switch traffic to new slot
echo "Switching traffic to $NEW_SLOT slot"
kubectl patch service solveforce-docs-service -p '{"spec":{"selector":{"slot":"'$NEW_SLOT'"}}}' -n $ENVIRONMENT

# Verify traffic switch
sleep 30
if curl -f "$HEALTH_CHECK_URL" > /dev/null 2>&1; then
  echo "Traffic switch successful"
  
  # Scale down old slot
  kubectl scale deployment solveforce-docs-$OLD_SLOT --replicas=0 -n $ENVIRONMENT
  echo "Blue-green deployment completed successfully"
else
  echo "Traffic switch failed, rolling back"
  kubectl patch service solveforce-docs-service -p '{"spec":{"selector":{"slot":"'$OLD_SLOT'"}}}' -n $ENVIRONMENT
  exit 1
fi

🐦 Canary Deployment

Canary Deployment Configuration:

# k8s/canary-deployment.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: solveforce-docs-canary
  namespace: production
spec:
  replicas: 10
  strategy:
    canary:
      steps:
      - setWeight: 10
      - pause: {duration: 2m}
      - setWeight: 25
      - pause: {duration: 5m}
      - setWeight: 50
      - pause: {duration: 10m}
      - setWeight: 75
      - pause: {duration: 10m}
      - setWeight: 100
      canaryService: solveforce-docs-canary-service
      stableService: solveforce-docs-stable-service
      trafficRouting:
        nginx:
          stableIngress: solveforce-docs-stable-ingress
          additionalIngressAnnotations:
            canary-by-header: X-Canary
      analysis:
        templates:
        - templateName: success-rate
        args:
        - name: service-name
          value: solveforce-docs
        startingStep: 2
        interval: 60s
        count: 5
        successCondition: result[0] >= 0.95
        failureCondition: result[0] < 0.90
  selector:
    matchLabels:
      app: solveforce-docs
  template:
    metadata:
      labels:
        app: solveforce-docs
    spec:
      containers:
      - name: docs
        image: solveforce/docs:latest
        ports:
        - containerPort: 80

🔄 Rolling Deployment

Rolling Update Strategy:

# k8s/rolling-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: solveforce-docs-rolling
  namespace: production
spec:
  replicas: 6
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  selector:
    matchLabels:
      app: solveforce-docs
  template:
    metadata:
      labels:
        app: solveforce-docs
    spec:
      containers:
      - name: docs
        image: solveforce/docs:latest
        ports:
        - containerPort: 80
        readinessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3

📊 Monitoring & Observability

📈 Deployment Metrics

Prometheus Metrics Collection:

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "deployment_rules.yml"

scrape_configs:
  - job_name: 'solveforce-docs'
    static_configs:
      - targets: ['docs.solveforce.com:80']
    metrics_path: /metrics
    scrape_interval: 30s
    
  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

Grafana Dashboard Configuration:

{
  "dashboard": {
    "title": "SOLVEFORCE Documentation Deployment",
    "panels": [
      {
        "title": "Deployment Success Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(deployment_success_total[5m]) / rate(deployment_total[5m])",
            "legendFormat": "Success Rate"
          }
        ]
      },
      {
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          }
        ]
      },
      {
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status=~\"5..\"}[5m])",
            "legendFormat": "5xx errors"
          }
        ]
      }
    ]
  }
}

🚨 Alerting Rules

Deployment Alert Rules:

# monitoring/deployment_rules.yml
groups:
  - name: deployment.rules
    rules:
    - alert: DeploymentFailed
      expr: increase(deployment_failed_total[1h]) > 0
      for: 0m
      labels:
        severity: critical
      annotations:
        summary: "Deployment failed for {{ $labels.service }}"
        description: "Deployment for {{ $labels.service }} has failed in the last hour"
    
    - alert: HighErrorRate
      expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "High error rate detected"
        description: "Error rate is {{ $value }} for the last 5 minutes"
    
    - alert: SlowResponseTime
      expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Slow response time detected"
        description: "95th percentile response time is {{ $value }}s"

📋 Health Checks

Comprehensive Health Check Script:

#!/usr/bin/env python3
# scripts/health_check.py

import requests
import sys
import time
from typing import List, Dict, Any

class HealthChecker:
    def __init__(self, base_url: str):
        self.base_url = base_url.rstrip('/')
        self.session = requests.Session()
        self.session.timeout = 10
        
    def check_basic_connectivity(self) -> Dict[str, Any]:
        """Check basic site connectivity."""
        try:
            response = self.session.get(f"{self.base_url}/")
            return {
                'status': 'healthy' if response.status_code == 200 else 'unhealthy',
                'status_code': response.status_code,
                'response_time': response.elapsed.total_seconds()
            }
        except Exception as e:
            return {
                'status': 'unhealthy',
                'error': str(e),
                'response_time': None
            }
    
    def check_critical_pages(self) -> Dict[str, Any]:
        """Check critical documentation pages."""
        critical_pages = [
            '/',
            '/technology/connectivity/',
            '/api/overview/',
            '/development/contributing/',
            '/states/california/'
        ]
        
        results = {}
        for page in critical_pages:
            try:
                response = self.session.get(f"{self.base_url}{page}")
                results[page] = {
                    'status': 'healthy' if response.status_code == 200 else 'unhealthy',
                    'status_code': response.status_code,
                    'response_time': response.elapsed.total_seconds()
                }
            except Exception as e:
                results[page] = {
                    'status': 'unhealthy',
                    'error': str(e)
                }
        
        return results
    
    def check_performance(self) -> Dict[str, Any]:
        """Check site performance metrics."""
        start_time = time.time()
        response_times = []
        
        # Make multiple requests to get average response time
        for _ in range(10):
            try:
                response = self.session.get(f"{self.base_url}/")
                if response.status_code == 200:
                    response_times.append(response.elapsed.total_seconds())
            except Exception:
                pass
        
        if response_times:
            avg_response_time = sum(response_times) / len(response_times)
            max_response_time = max(response_times)
            min_response_time = min(response_times)
            
            return {
                'status': 'healthy' if avg_response_time < 2.0 else 'degraded',
                'avg_response_time': avg_response_time,
                'max_response_time': max_response_time,
                'min_response_time': min_response_time,
                'successful_requests': len(response_times)
            }
        else:
            return {
                'status': 'unhealthy',
                'error': 'No successful requests'
            }
    
    def run_all_checks(self) -> Dict[str, Any]:
        """Run all health checks."""
        return {
            'timestamp': time.time(),
            'basic_connectivity': self.check_basic_connectivity(),
            'critical_pages': self.check_critical_pages(),
            'performance': self.check_performance()
        }

def main():
    if len(sys.argv) != 2:
        print("Usage: python health_check.py <base_url>")
        sys.exit(1)
    
    base_url = sys.argv[1]
    checker = HealthChecker(base_url)
    results = checker.run_all_checks()
    
    # Print results
    print(f"Health check results for {base_url}:")
    print(f"Basic connectivity: {results['basic_connectivity']['status']}")
    
    critical_pages_healthy = all(
        page['status'] == 'healthy' 
        for page in results['critical_pages'].values()
    )
    print(f"Critical pages: {'healthy' if critical_pages_healthy else 'unhealthy'}")
    print(f"Performance: {results['performance']['status']}")
    
    # Exit with error code if any checks failed
    overall_healthy = (
        results['basic_connectivity']['status'] == 'healthy' and
        critical_pages_healthy and
        results['performance']['status'] in ['healthy', 'degraded']
    )
    
    sys.exit(0 if overall_healthy else 1)

if __name__ == '__main__':
    main()

🔙 Rollback Procedures

⏪ Automated Rollback

Rollback Script:

#!/bin/bash
# scripts/rollback.sh

set -e

ENVIRONMENT=${1:-production}
TARGET_VERSION=${2:-previous}

echo "Starting rollback for environment: $ENVIRONMENT"

if [ "$TARGET_VERSION" == "previous" ]; then
    # Get previous version from deployment history
    TARGET_VERSION=$(kubectl rollout history deployment/solveforce-docs -n $ENVIRONMENT | tail -n 2 | head -n 1 | awk '{print $1}')
fi

echo "Rolling back to version: $TARGET_VERSION"

# Perform rollback
kubectl rollout undo deployment/solveforce-docs --to-revision=$TARGET_VERSION -n $ENVIRONMENT

# Wait for rollback to complete
kubectl rollout status deployment/solveforce-docs -n $ENVIRONMENT --timeout=300s

# Verify rollback success
echo "Verifying rollback..."
sleep 30

HEALTH_CHECK_URL="https://docs.solveforce.com"
if [ "$ENVIRONMENT" == "staging" ]; then
    HEALTH_CHECK_URL="https://staging-docs.solveforce.com"
fi

if curl -f "$HEALTH_CHECK_URL" > /dev/null 2>&1; then
    echo "Rollback completed successfully"
    
    # Notify team
    curl -X POST "$SLACK_WEBHOOK_URL" \
        -H 'Content-type: application/json' \
        --data "{
            \"text\": \"🔄 Rollback completed for $ENVIRONMENT to version $TARGET_VERSION\",
            \"channel\": \"#deployments\"
        }"
else
    echo "Rollback verification failed"
    exit 1
fi