Deployment
Comprehensive deployment procedures and CI/CD workflows for SOLVEFORCE projects, ensuring reliable, secure, and automated delivery of applications and documentation.
π Deployment Philosophy
π― Core Principles
Automation First:
- Automated pipelines for consistent deployments
- Infrastructure as Code (IaC) for reproducible environments
- Automated testing at every stage
- Zero-downtime deployment strategies
- Rollback capabilities for quick recovery
Security & Compliance:
- Secure deployment pipelines with secrets management
- Compliance checks and security scanning
- Access controls and audit trails
- Environment isolation and protection
- Regular security updates and patches
Reliability & Performance:
- Blue-green and canary deployment strategies
- Health checks and monitoring integration
- Performance validation during deployment
- Automated smoke testing post-deployment
- Comprehensive logging and observability
ποΈ CI/CD Pipeline Architecture
π Pipeline Overview
graph LR
A[Code Push] --> B[Build]
B --> C[Test]
C --> D[Security Scan]
D --> E[Deploy to Staging]
E --> F[Integration Tests]
F --> G[Deploy to Production]
G --> H[Post-deployment Tests]
H --> I[Monitor & Alert]
Pipeline Stages:
- Source Control: Git-based version control
- Build: Compile, package, and prepare artifacts
- Test: Unit, integration, and security testing
- Deploy: Automated deployment to environments
- Validate: Post-deployment testing and verification
- Monitor: Ongoing health and performance monitoring
π οΈ GitHub Actions Workflows
Main Deployment Workflow:
# .github/workflows/deploy.yml
name: Deploy to Production
on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
env:
NODE_VERSION: '18'
PYTHON_VERSION: '3.10'
jobs:
build:
runs-on: ubuntu-latest
outputs:
version: ${{ steps.version.outputs.version }}
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Generate version
id: version
run: |
VERSION=$(date +'%Y.%m.%d')-${GITHUB_SHA::8}
echo "version=$VERSION" >> $GITHUB_OUTPUT
echo "Generated version: $VERSION"
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'npm'
- name: Install dependencies
run: |
npm ci
npm install -g markdownlint-cli
- name: Setup mdBook
uses: peaceiris/actions-mdbook@v1
with:
mdbook-version: '0.4.40'
- name: Lint documentation
run: |
markdownlint src/**/*.md
- name: Build documentation
run: |
mdbook build
- name: Upload build artifacts
uses: actions/upload-artifact@v3
with:
name: documentation-${{ steps.version.outputs.version }}
path: book/
retention-days: 30
test:
runs-on: ubuntu-latest
needs: build
strategy:
matrix:
test-type: [unit, integration, e2e]
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Python
uses: actions/setup-python@v3
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: Install test dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements-test.txt
- name: Run ${{ matrix.test-type }} tests
run: |
pytest tests/${{ matrix.test-type }}/ --junitxml=test-results-${{ matrix.test-type }}.xml
- name: Upload test results
uses: actions/upload-artifact@v3
if: always()
with:
name: test-results-${{ matrix.test-type }}
path: test-results-${{ matrix.test-type }}.xml
security-scan:
runs-on: ubuntu-latest
needs: build
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@master
with:
scan-type: 'fs'
scan-ref: '.'
format: 'sarif'
output: 'trivy-results.sarif'
- name: Upload security scan results
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: 'trivy-results.sarif'
- name: Run dependency check
run: |
pip install safety
safety check --json --output safety-report.json || true
- name: Upload safety report
uses: actions/upload-artifact@v3
with:
name: security-reports
path: safety-report.json
deploy-staging:
runs-on: ubuntu-latest
needs: [build, test, security-scan]
if: github.ref == 'refs/heads/main'
environment:
name: staging
url: https://staging-docs.solveforce.com
steps:
- name: Download build artifacts
uses: actions/download-artifact@v3
with:
name: documentation-${{ needs.build.outputs.version }}
path: ./book
- name: Deploy to staging
run: |
# Deploy to staging environment
aws s3 sync ./book s3://staging-docs-solveforce-com --delete
aws cloudfront create-invalidation --distribution-id ${{ secrets.STAGING_CLOUDFRONT_ID }} --paths "/*"
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-1
- name: Run staging smoke tests
run: |
curl -f https://staging-docs.solveforce.com/ || exit 1
curl -f https://staging-docs.solveforce.com/technology/connectivity/ || exit 1
deploy-production:
runs-on: ubuntu-latest
needs: [build, deploy-staging]
if: github.ref == 'refs/heads/main'
environment:
name: production
url: https://docs.solveforce.com
steps:
- name: Download build artifacts
uses: actions/download-artifact@v3
with:
name: documentation-${{ needs.build.outputs.version }}
path: ./book
- name: Deploy to production
run: |
# Blue-green deployment to production
aws s3 sync ./book s3://docs-solveforce-com --delete
aws cloudfront create-invalidation --distribution-id ${{ secrets.PROD_CLOUDFRONT_ID }} --paths "/*"
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_DEFAULT_REGION: us-east-1
- name: Run production health checks
run: |
curl -f https://docs.solveforce.com/ || exit 1
curl -f https://docs.solveforce.com/technology/connectivity/ || exit 1
curl -f https://docs.solveforce.com/api/overview/ || exit 1
- name: Notify deployment success
uses: 8398a7/action-slack@v3
with:
status: success
text: 'Documentation successfully deployed to production'
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
post-deployment:
runs-on: ubuntu-latest
needs: deploy-production
steps:
- name: Run post-deployment tests
run: |
# Run comprehensive post-deployment validation
python scripts/validate_deployment.py --environment production
- name: Update deployment metrics
run: |
# Update deployment dashboard
curl -X POST "https://metrics.solveforce.com/deployments" \
-H "Authorization: Bearer ${{ secrets.METRICS_API_TOKEN }}" \
-d '{
"service": "documentation",
"version": "${{ needs.build.outputs.version }}",
"environment": "production",
"status": "success",
"timestamp": "'$(date -u +%Y-%m-%dT%H:%M:%SZ)'"
}'
π Secrets Management
GitHub Secrets Configuration:
# Required secrets in GitHub repository settings
AWS_ACCESS_KEY_ID: # AWS IAM access key for deployment
AWS_SECRET_ACCESS_KEY: # AWS IAM secret key for deployment
STAGING_CLOUDFRONT_ID: # CloudFront distribution ID for staging
PROD_CLOUDFRONT_ID: # CloudFront distribution ID for production
SLACK_WEBHOOK_URL: # Slack webhook for notifications
METRICS_API_TOKEN: # API token for deployment metrics
Environment Variables:
# .env.production
NODE_ENV=production
API_BASE_URL=https://api.solveforce.com
CDN_URL=https://cdn.solveforce.com
MONITORING_ENDPOINT=https://monitoring.solveforce.com
# .env.staging
NODE_ENV=staging
API_BASE_URL=https://staging-api.solveforce.com
CDN_URL=https://staging-cdn.solveforce.com
MONITORING_ENDPOINT=https://staging-monitoring.solveforce.com
π Infrastructure as Code
βοΈ AWS Infrastructure
Terraform Configuration:
# infrastructure/main.tf
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "solveforce-terraform-state"
key = "documentation/terraform.tfstate"
region = "us-east-1"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Project = "SolveForce Documentation"
Environment = var.environment
ManagedBy = "Terraform"
}
}
}
# S3 bucket for static site hosting
resource "aws_s3_bucket" "docs_bucket" {
bucket = "${var.environment}-docs-solveforce-com"
}
resource "aws_s3_bucket_public_access_block" "docs_bucket_pab" {
bucket = aws_s3_bucket.docs_bucket.id
block_public_acls = false
block_public_policy = false
ignore_public_acls = false
restrict_public_buckets = false
}
resource "aws_s3_bucket_website_configuration" "docs_bucket_website" {
bucket = aws_s3_bucket.docs_bucket.id
index_document {
suffix = "index.html"
}
error_document {
key = "404.html"
}
}
# CloudFront distribution
resource "aws_cloudfront_distribution" "docs_distribution" {
origin {
domain_name = aws_s3_bucket.docs_bucket.bucket_regional_domain_name
origin_id = "S3-${aws_s3_bucket.docs_bucket.id}"
s3_origin_config {
origin_access_identity = aws_cloudfront_origin_access_identity.docs_oai.cloudfront_access_identity_path
}
}
enabled = true
is_ipv6_enabled = true
default_root_object = "index.html"
aliases = var.environment == "production" ? ["docs.solveforce.com"] : ["staging-docs.solveforce.com"]
default_cache_behavior {
allowed_methods = ["DELETE", "GET", "HEAD", "OPTIONS", "PATCH", "POST", "PUT"]
cached_methods = ["GET", "HEAD"]
target_origin_id = "S3-${aws_s3_bucket.docs_bucket.id}"
compress = true
viewer_protocol_policy = "redirect-to-https"
forwarded_values {
query_string = false
cookies {
forward = "none"
}
}
min_ttl = 0
default_ttl = 3600
max_ttl = 86400
}
restrictions {
geo_restriction {
restriction_type = "none"
}
}
viewer_certificate {
acm_certificate_arn = var.ssl_certificate_arn
ssl_support_method = "sni-only"
minimum_protocol_version = "TLSv1.2_2021"
}
custom_error_response {
error_code = 404
response_code = 404
response_page_path = "/404.html"
}
tags = {
Name = "${var.environment}-docs-distribution"
}
}
# Variables
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
}
variable "ssl_certificate_arn" {
description = "SSL certificate ARN for CloudFront"
type = string
}
# Outputs
output "cloudfront_distribution_id" {
value = aws_cloudfront_distribution.docs_distribution.id
}
output "s3_bucket_name" {
value = aws_s3_bucket.docs_bucket.id
}
output "website_url" {
value = "https://${aws_cloudfront_distribution.docs_distribution.domain_name}"
}
Deployment Script:
#!/bin/bash
# scripts/deploy-infrastructure.sh
set -e
ENVIRONMENT=${1:-staging}
AWS_REGION=${2:-us-east-1}
echo "Deploying infrastructure for environment: $ENVIRONMENT"
# Initialize Terraform
cd infrastructure
terraform init
# Validate configuration
terraform validate
# Plan deployment
terraform plan \
-var="environment=$ENVIRONMENT" \
-var="aws_region=$AWS_REGION" \
-out=tfplan
# Apply if approved
read -p "Apply this plan? (y/N): " confirm
if [[ $confirm == [yY] || $confirm == [yY][eE][sS] ]]; then
terraform apply tfplan
echo "Infrastructure deployment completed successfully!"
else
echo "Deployment cancelled."
exit 1
fi
π³ Containerized Applications
Docker Configuration:
# Dockerfile
FROM node:18-alpine AS builder
WORKDIR /app
# Install mdBook
RUN wget https://github.com/rust-lang/mdBook/releases/download/v0.4.40/mdbook-v0.4.40-x86_64-unknown-linux-musl.tar.gz \
&& tar -xzf mdbook-v0.4.40-x86_64-unknown-linux-musl.tar.gz \
&& chmod +x mdbook \
&& mv mdbook /usr/local/bin/
# Copy source files
COPY src/ ./src/
COPY book.toml ./
# Build documentation
RUN mdbook build
# Production stage
FROM nginx:alpine
# Copy built documentation
COPY --from=builder /app/book /usr/share/nginx/html
# Copy nginx configuration
COPY nginx.conf /etc/nginx/nginx.conf
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost/ || exit 1
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
Docker Compose for Local Development:
# docker-compose.yml
version: '3.8'
services:
docs:
build: .
ports:
- "3000:80"
volumes:
- ./src:/app/src:ro
- ./book.toml:/app/book.toml:ro
environment:
- NODE_ENV=development
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost/"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
docs-dev:
image: rust:1.70
working_dir: /app
volumes:
- .:/app
ports:
- "3001:3001"
command: |
bash -c "
cargo install mdbook &&
mdbook serve --hostname 0.0.0.0 --port 3001
"
networks:
default:
name: solveforce-docs
βοΈ Kubernetes Deployment
Kubernetes Manifests:
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: solveforce-docs
namespace: production
labels:
app: solveforce-docs
version: v1.0.0
spec:
replicas: 3
selector:
matchLabels:
app: solveforce-docs
template:
metadata:
labels:
app: solveforce-docs
version: v1.0.0
spec:
containers:
- name: docs
image: solveforce/docs:latest
ports:
- containerPort: 80
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5
periodSeconds: 5
env:
- name: NODE_ENV
value: "production"
---
apiVersion: v1
kind: Service
metadata:
name: solveforce-docs-service
namespace: production
spec:
selector:
app: solveforce-docs
ports:
- protocol: TCP
port: 80
targetPort: 80
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: solveforce-docs-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
tls:
- hosts:
- docs.solveforce.com
secretName: solveforce-docs-tls
rules:
- host: docs.solveforce.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: solveforce-docs-service
port:
number: 80
π¦ Deployment Strategies
π΅ Blue-Green Deployment
Implementation Script:
#!/bin/bash
# scripts/blue-green-deploy.sh
set -e
ENVIRONMENT=${1:-production}
VERSION=${2:-latest}
HEALTH_CHECK_URL=${3:-https://docs.solveforce.com}
echo "Starting blue-green deployment for $ENVIRONMENT"
# Determine current and new slots
CURRENT_SLOT=$(aws elbv2 describe-target-groups --names "$ENVIRONMENT-docs-blue" --query 'TargetGroups[0].HealthyHostCount.N' --output text)
NEW_SLOT="green"
if [ "$CURRENT_SLOT" == "blue" ]; then
NEW_SLOT="blue"
OLD_SLOT="green"
else
NEW_SLOT="green"
OLD_SLOT="blue"
fi
echo "Deploying to $NEW_SLOT slot"
# Deploy to new slot
kubectl set image deployment/solveforce-docs-$NEW_SLOT docs=solveforce/docs:$VERSION -n $ENVIRONMENT
# Wait for deployment to be ready
kubectl rollout status deployment/solveforce-docs-$NEW_SLOT -n $ENVIRONMENT --timeout=300s
# Run health checks
echo "Running health checks on $NEW_SLOT slot"
for i in {1..10}; do
if curl -f "$HEALTH_CHECK_URL" > /dev/null 2>&1; then
echo "Health check passed"
break
else
echo "Health check failed, attempt $i/10"
sleep 30
fi
if [ $i -eq 10 ]; then
echo "Health checks failed, rolling back"
exit 1
fi
done
# Switch traffic to new slot
echo "Switching traffic to $NEW_SLOT slot"
kubectl patch service solveforce-docs-service -p '{"spec":{"selector":{"slot":"'$NEW_SLOT'"}}}' -n $ENVIRONMENT
# Verify traffic switch
sleep 30
if curl -f "$HEALTH_CHECK_URL" > /dev/null 2>&1; then
echo "Traffic switch successful"
# Scale down old slot
kubectl scale deployment solveforce-docs-$OLD_SLOT --replicas=0 -n $ENVIRONMENT
echo "Blue-green deployment completed successfully"
else
echo "Traffic switch failed, rolling back"
kubectl patch service solveforce-docs-service -p '{"spec":{"selector":{"slot":"'$OLD_SLOT'"}}}' -n $ENVIRONMENT
exit 1
fi
π¦ Canary Deployment
Canary Deployment Configuration:
# k8s/canary-deployment.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: solveforce-docs-canary
namespace: production
spec:
replicas: 10
strategy:
canary:
steps:
- setWeight: 10
- pause: {duration: 2m}
- setWeight: 25
- pause: {duration: 5m}
- setWeight: 50
- pause: {duration: 10m}
- setWeight: 75
- pause: {duration: 10m}
- setWeight: 100
canaryService: solveforce-docs-canary-service
stableService: solveforce-docs-stable-service
trafficRouting:
nginx:
stableIngress: solveforce-docs-stable-ingress
additionalIngressAnnotations:
canary-by-header: X-Canary
analysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: solveforce-docs
startingStep: 2
interval: 60s
count: 5
successCondition: result[0] >= 0.95
failureCondition: result[0] < 0.90
selector:
matchLabels:
app: solveforce-docs
template:
metadata:
labels:
app: solveforce-docs
spec:
containers:
- name: docs
image: solveforce/docs:latest
ports:
- containerPort: 80
π Rolling Deployment
Rolling Update Strategy:
# k8s/rolling-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: solveforce-docs-rolling
namespace: production
spec:
replicas: 6
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 2
maxUnavailable: 1
selector:
matchLabels:
app: solveforce-docs
template:
metadata:
labels:
app: solveforce-docs
spec:
containers:
- name: docs
image: solveforce/docs:latest
ports:
- containerPort: 80
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 3
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
π Monitoring & Observability
π Deployment Metrics
Prometheus Metrics Collection:
# monitoring/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "deployment_rules.yml"
scrape_configs:
- job_name: 'solveforce-docs'
static_configs:
- targets: ['docs.solveforce.com:80']
metrics_path: /metrics
scrape_interval: 30s
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
Grafana Dashboard Configuration:
{
"dashboard": {
"title": "SOLVEFORCE Documentation Deployment",
"panels": [
{
"title": "Deployment Success Rate",
"type": "stat",
"targets": [
{
"expr": "rate(deployment_success_total[5m]) / rate(deployment_total[5m])",
"legendFormat": "Success Rate"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
"legendFormat": "95th percentile"
}
]
},
{
"title": "Error Rate",
"type": "graph",
"targets": [
{
"expr": "rate(http_requests_total{status=~\"5..\"}[5m])",
"legendFormat": "5xx errors"
}
]
}
]
}
}
π¨ Alerting Rules
Deployment Alert Rules:
# monitoring/deployment_rules.yml
groups:
- name: deployment.rules
rules:
- alert: DeploymentFailed
expr: increase(deployment_failed_total[1h]) > 0
for: 0m
labels:
severity: critical
annotations:
summary: "Deployment failed for {{ $labels.service }}"
description: "Deployment for {{ $labels.service }} has failed in the last hour"
- alert: HighErrorRate
expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} for the last 5 minutes"
- alert: SlowResponseTime
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels:
severity: warning
annotations:
summary: "Slow response time detected"
description: "95th percentile response time is {{ $value }}s"
π Health Checks
Comprehensive Health Check Script:
#!/usr/bin/env python3
# scripts/health_check.py
import requests
import sys
import time
from typing import List, Dict, Any
class HealthChecker:
def __init__(self, base_url: str):
self.base_url = base_url.rstrip('/')
self.session = requests.Session()
self.session.timeout = 10
def check_basic_connectivity(self) -> Dict[str, Any]:
"""Check basic site connectivity."""
try:
response = self.session.get(f"{self.base_url}/")
return {
'status': 'healthy' if response.status_code == 200 else 'unhealthy',
'status_code': response.status_code,
'response_time': response.elapsed.total_seconds()
}
except Exception as e:
return {
'status': 'unhealthy',
'error': str(e),
'response_time': None
}
def check_critical_pages(self) -> Dict[str, Any]:
"""Check critical documentation pages."""
critical_pages = [
'/',
'/technology/connectivity/',
'/api/overview/',
'/development/contributing/',
'/states/california/'
]
results = {}
for page in critical_pages:
try:
response = self.session.get(f"{self.base_url}{page}")
results[page] = {
'status': 'healthy' if response.status_code == 200 else 'unhealthy',
'status_code': response.status_code,
'response_time': response.elapsed.total_seconds()
}
except Exception as e:
results[page] = {
'status': 'unhealthy',
'error': str(e)
}
return results
def check_performance(self) -> Dict[str, Any]:
"""Check site performance metrics."""
start_time = time.time()
response_times = []
# Make multiple requests to get average response time
for _ in range(10):
try:
response = self.session.get(f"{self.base_url}/")
if response.status_code == 200:
response_times.append(response.elapsed.total_seconds())
except Exception:
pass
if response_times:
avg_response_time = sum(response_times) / len(response_times)
max_response_time = max(response_times)
min_response_time = min(response_times)
return {
'status': 'healthy' if avg_response_time < 2.0 else 'degraded',
'avg_response_time': avg_response_time,
'max_response_time': max_response_time,
'min_response_time': min_response_time,
'successful_requests': len(response_times)
}
else:
return {
'status': 'unhealthy',
'error': 'No successful requests'
}
def run_all_checks(self) -> Dict[str, Any]:
"""Run all health checks."""
return {
'timestamp': time.time(),
'basic_connectivity': self.check_basic_connectivity(),
'critical_pages': self.check_critical_pages(),
'performance': self.check_performance()
}
def main():
if len(sys.argv) != 2:
print("Usage: python health_check.py <base_url>")
sys.exit(1)
base_url = sys.argv[1]
checker = HealthChecker(base_url)
results = checker.run_all_checks()
# Print results
print(f"Health check results for {base_url}:")
print(f"Basic connectivity: {results['basic_connectivity']['status']}")
critical_pages_healthy = all(
page['status'] == 'healthy'
for page in results['critical_pages'].values()
)
print(f"Critical pages: {'healthy' if critical_pages_healthy else 'unhealthy'}")
print(f"Performance: {results['performance']['status']}")
# Exit with error code if any checks failed
overall_healthy = (
results['basic_connectivity']['status'] == 'healthy' and
critical_pages_healthy and
results['performance']['status'] in ['healthy', 'degraded']
)
sys.exit(0 if overall_healthy else 1)
if __name__ == '__main__':
main()
π Rollback Procedures
βͺ Automated Rollback
Rollback Script:
#!/bin/bash
# scripts/rollback.sh
set -e
ENVIRONMENT=${1:-production}
TARGET_VERSION=${2:-previous}
echo "Starting rollback for environment: $ENVIRONMENT"
if [ "$TARGET_VERSION" == "previous" ]; then
# Get previous version from deployment history
TARGET_VERSION=$(kubectl rollout history deployment/solveforce-docs -n $ENVIRONMENT | tail -n 2 | head -n 1 | awk '{print $1}')
fi
echo "Rolling back to version: $TARGET_VERSION"
# Perform rollback
kubectl rollout undo deployment/solveforce-docs --to-revision=$TARGET_VERSION -n $ENVIRONMENT
# Wait for rollback to complete
kubectl rollout status deployment/solveforce-docs -n $ENVIRONMENT --timeout=300s
# Verify rollback success
echo "Verifying rollback..."
sleep 30
HEALTH_CHECK_URL="https://docs.solveforce.com"
if [ "$ENVIRONMENT" == "staging" ]; then
HEALTH_CHECK_URL="https://staging-docs.solveforce.com"
fi
if curl -f "$HEALTH_CHECK_URL" > /dev/null 2>&1; then
echo "Rollback completed successfully"
# Notify team
curl -X POST "$SLACK_WEBHOOK_URL" \
-H 'Content-type: application/json' \
--data "{
\"text\": \"π Rollback completed for $ENVIRONMENT to version $TARGET_VERSION\",
\"channel\": \"#deployments\"
}"
else
echo "Rollback verification failed"
exit 1
fi
π Rollback Decision Matrix
Automated Rollback Triggers:
- Error rate > 5% for 5 minutes
- Response time > 5 seconds for 3 minutes
- Availability < 99% for 2 minutes
- Critical functionality failures
Manual Rollback Scenarios:
- Security vulnerabilities discovered
- Data corruption or loss
- Business-critical feature failures
- Customer-reported issues
π Deployment Support
π§ Deployment Resources
Documentation and Help:
- Deployment Guide: This comprehensive document
- Runbooks: Step-by-step deployment procedures
- Troubleshooting: Common deployment issues and solutions
- Architecture Diagrams: Infrastructure and pipeline documentation
Tools and Platforms:
- GitHub Actions: Primary CI/CD platform
- AWS: Cloud infrastructure provider
- Terraform: Infrastructure as Code
- Kubernetes: Container orchestration
- Monitoring: Prometheus, Grafana, AlertManager
π Emergency Procedures
Incident Response:
- Immediate Assessment: Determine scope and impact
- Communication: Notify stakeholders and team
- Mitigation: Implement quick fixes or rollbacks
- Investigation: Root cause analysis
- Resolution: Permanent fix implementation
- Post-Mortem: Documentation and process improvement
Emergency Contacts:
- On-Call Engineer: Available 24/7 via PagerDuty
- DevOps Team: deployment@solveforce.com
- Security Team: security@solveforce.com
- Management: Available during business hours
Reliable deployments ensure SOLVEFORCE delivers consistent, high-quality services to our customers worldwide.
Deploy with Confidence, Monitor with Precision β SOLVEFORCE Deployment Excellence.