feat: add production operations scripts and monitoring guide
Add comprehensive tooling for production deployment: Scripts (scripts/): - backup-db.sh: Automated database backups with 7-day retention - restore-db.sh: Safe database restore with confirmation prompts - health-check.sh: Complete service health monitoring - README.md: Operational scripts documentation Monitoring (docs/MONITORING.md): - Application health monitoring - Docker container monitoring - External monitoring setup (UptimeRobot, Pingdom) - Log monitoring and rotation - Alerting configuration - Incident response procedures - SLA targets and metrics All scripts include: - Environment support (dev/prod) - Error handling and validation - Detailed status reporting - Safety confirmations where needed
This commit is contained in:
186
scripts/README.md
Normal file
186
scripts/README.md
Normal file
@@ -0,0 +1,186 @@
|
||||
# spotlight.cam - Operations Scripts
|
||||
|
||||
Utility scripts for managing spotlight.cam deployment, backups, and monitoring.
|
||||
|
||||
## 📋 Available Scripts
|
||||
|
||||
### 1. Database Backup (`backup-db.sh`)
|
||||
|
||||
Creates a timestamped backup of the PostgreSQL database.
|
||||
|
||||
```bash
|
||||
# Backup development database
|
||||
./scripts/backup-db.sh dev
|
||||
|
||||
# Backup production database
|
||||
./scripts/backup-db.sh prod
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Automatic timestamping
|
||||
- Automatic cleanup (keeps last 7 days)
|
||||
- Backup size reporting
|
||||
- Error handling
|
||||
|
||||
**Backups location:** `./backups/`
|
||||
|
||||
**Setup cron job for automatic backups:**
|
||||
```bash
|
||||
# Edit crontab
|
||||
crontab -e
|
||||
|
||||
# Add daily backup at 2 AM (production)
|
||||
0 2 * * * cd /path/to/spotlightcam && ./scripts/backup-db.sh prod >> logs/backup.log 2>&1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 2. Database Restore (`restore-db.sh`)
|
||||
|
||||
Restores database from a backup file.
|
||||
|
||||
```bash
|
||||
# Restore development database
|
||||
./scripts/restore-db.sh ./backups/backup_dev_20251120_120000.sql dev
|
||||
|
||||
# Restore production database
|
||||
./scripts/restore-db.sh ./backups/backup_prod_20251120_120000.sql prod
|
||||
```
|
||||
|
||||
**Safety features:**
|
||||
- Confirmation prompt
|
||||
- File existence check
|
||||
- Container status validation
|
||||
|
||||
---
|
||||
|
||||
### 3. Health Check (`health-check.sh`)
|
||||
|
||||
Monitors all services and checks system health.
|
||||
|
||||
```bash
|
||||
# Check development environment
|
||||
./scripts/health-check.sh dev
|
||||
|
||||
# Check production environment
|
||||
./scripts/health-check.sh prod
|
||||
```
|
||||
|
||||
**Checks:**
|
||||
- ✅ nginx container status
|
||||
- ✅ Frontend container status
|
||||
- ✅ Backend container status
|
||||
- ✅ Database container status
|
||||
- ✅ API health endpoint
|
||||
- ✅ Database connection
|
||||
|
||||
**Exit codes:**
|
||||
- `0` - All systems operational
|
||||
- `1` - One or more services unhealthy
|
||||
|
||||
**Setup monitoring:**
|
||||
```bash
|
||||
# Check every 5 minutes and send alerts on failure
|
||||
*/5 * * * * /path/to/spotlightcam/scripts/health-check.sh prod || /path/to/alert-script.sh
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 Setup Instructions
|
||||
|
||||
### Make scripts executable
|
||||
```bash
|
||||
chmod +x scripts/*.sh
|
||||
```
|
||||
|
||||
### Create backups directory
|
||||
```bash
|
||||
mkdir -p backups
|
||||
```
|
||||
|
||||
### Test scripts
|
||||
```bash
|
||||
# Test backup
|
||||
./scripts/backup-db.sh dev
|
||||
|
||||
# Test health check
|
||||
./scripts/health-check.sh dev
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Monitoring Setup
|
||||
|
||||
### Option 1: Cron-based monitoring
|
||||
```bash
|
||||
# Add to crontab
|
||||
crontab -e
|
||||
|
||||
# Health check every 5 minutes
|
||||
*/5 * * * * /path/to/spotlightcam/scripts/health-check.sh prod || echo "Health check failed" | mail -s "spotlight.cam Alert" admin@example.com
|
||||
|
||||
# Daily backup at 2 AM
|
||||
0 2 * * * /path/to/spotlightcam/scripts/backup-db.sh prod
|
||||
```
|
||||
|
||||
### Option 2: External monitoring
|
||||
Recommended external services:
|
||||
- **UptimeRobot** - Free, checks every 5 min
|
||||
- **Pingdom** - Advanced monitoring
|
||||
- **StatusCake** - Free tier available
|
||||
|
||||
Monitor these endpoints:
|
||||
- `https://spotlight.cam` - Frontend
|
||||
- `https://spotlight.cam/api/health` - Backend API
|
||||
|
||||
---
|
||||
|
||||
## 🚨 Troubleshooting
|
||||
|
||||
### Backup fails
|
||||
```bash
|
||||
# Check if database container is running
|
||||
docker ps | grep slc-db
|
||||
|
||||
# Check database logs
|
||||
docker logs slc-db-prod
|
||||
|
||||
# Manually test connection
|
||||
docker exec slc-db-prod pg_isready -U spotlightcam
|
||||
```
|
||||
|
||||
### Health check fails
|
||||
```bash
|
||||
# View detailed container status
|
||||
docker ps -a
|
||||
|
||||
# Check specific service logs
|
||||
docker logs slc-backend-prod
|
||||
docker logs slc-db-prod
|
||||
|
||||
# Restart services
|
||||
docker compose --profile prod restart
|
||||
```
|
||||
|
||||
### Restore fails
|
||||
```bash
|
||||
# Check backup file integrity
|
||||
head -n 10 ./backups/backup_prod_20251120_120000.sql
|
||||
|
||||
# Verify database is accepting connections
|
||||
docker exec slc-db-prod psql -U spotlightcam -c "SELECT version();"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 Best Practices
|
||||
|
||||
1. **Always test backups** - Regularly test restore process in dev environment
|
||||
2. **Monitor disk space** - Ensure backups directory has enough space
|
||||
3. **Secure backups** - Store backups off-server (AWS S3, Backblaze B2)
|
||||
4. **Regular testing** - Test health checks and disaster recovery procedures
|
||||
5. **Log rotation** - Use logrotate for script logs
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-11-20
|
||||
61
scripts/backup-db.sh
Normal file
61
scripts/backup-db.sh
Normal file
@@ -0,0 +1,61 @@
|
||||
#!/bin/bash
|
||||
# Database backup script for spotlight.cam
|
||||
# Usage: ./scripts/backup-db.sh [dev|prod]
|
||||
|
||||
set -e
|
||||
|
||||
# Default to development if no argument provided
|
||||
ENV=${1:-dev}
|
||||
|
||||
# Configuration
|
||||
DATE=$(date +%Y%m%d_%H%M%S)
|
||||
BACKUP_DIR="./backups"
|
||||
|
||||
# Create backup directory if it doesn't exist
|
||||
mkdir -p "$BACKUP_DIR"
|
||||
|
||||
# Set container name based on environment
|
||||
if [ "$ENV" = "prod" ]; then
|
||||
DB_CONTAINER="slc-db-prod"
|
||||
DB_NAME="spotlightcam"
|
||||
BACKUP_FILE="$BACKUP_DIR/backup_prod_$DATE.sql"
|
||||
else
|
||||
DB_CONTAINER="slc-db"
|
||||
DB_NAME="spotlightcam"
|
||||
BACKUP_FILE="$BACKUP_DIR/backup_dev_$DATE.sql"
|
||||
fi
|
||||
|
||||
echo "🔄 Starting database backup..."
|
||||
echo "📦 Environment: $ENV"
|
||||
echo "🗄️ Container: $DB_CONTAINER"
|
||||
echo "💾 Backup file: $BACKUP_FILE"
|
||||
|
||||
# Check if container is running
|
||||
if ! docker ps --format '{{.Names}}' | grep -q "^${DB_CONTAINER}$"; then
|
||||
echo "❌ Error: Container $DB_CONTAINER is not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create backup
|
||||
docker exec "$DB_CONTAINER" pg_dump -U spotlightcam "$DB_NAME" > "$BACKUP_FILE"
|
||||
|
||||
# Check if backup was successful
|
||||
if [ $? -eq 0 ]; then
|
||||
BACKUP_SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
|
||||
echo "✅ Backup completed successfully!"
|
||||
echo "📊 Backup size: $BACKUP_SIZE"
|
||||
echo "📁 Location: $BACKUP_FILE"
|
||||
else
|
||||
echo "❌ Backup failed!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Keep only last 7 days of backups
|
||||
echo "🧹 Cleaning old backups (keeping last 7 days)..."
|
||||
find "$BACKUP_DIR" -name "backup_*.sql" -mtime +7 -delete
|
||||
|
||||
# Count remaining backups
|
||||
BACKUP_COUNT=$(find "$BACKUP_DIR" -name "backup_*.sql" | wc -l)
|
||||
echo "📚 Total backups: $BACKUP_COUNT"
|
||||
|
||||
echo "✨ Done!"
|
||||
88
scripts/health-check.sh
Normal file
88
scripts/health-check.sh
Normal file
@@ -0,0 +1,88 @@
|
||||
#!/bin/bash
|
||||
# Health check script for spotlight.cam
|
||||
# Usage: ./scripts/health-check.sh [dev|prod]
|
||||
|
||||
set -e
|
||||
|
||||
ENV=${1:-dev}
|
||||
|
||||
# Set service names based on environment
|
||||
if [ "$ENV" = "prod" ]; then
|
||||
NGINX_CONTAINER="slc-proxy-prod"
|
||||
FRONTEND_CONTAINER="slc-frontend-prod"
|
||||
BACKEND_CONTAINER="slc-backend-prod"
|
||||
DB_CONTAINER="slc-db-prod"
|
||||
API_URL="https://spotlight.cam/api/health"
|
||||
else
|
||||
NGINX_CONTAINER="slc-proxy"
|
||||
FRONTEND_CONTAINER="slc-frontend"
|
||||
BACKEND_CONTAINER="slc-backend"
|
||||
DB_CONTAINER="slc-db"
|
||||
API_URL="http://localhost:8080/api/health"
|
||||
fi
|
||||
|
||||
echo "🏥 spotlight.cam Health Check"
|
||||
echo "📦 Environment: $ENV"
|
||||
echo "================================"
|
||||
echo ""
|
||||
|
||||
# Function to check container status
|
||||
check_container() {
|
||||
local container=$1
|
||||
local service=$2
|
||||
|
||||
if docker ps --format '{{.Names}}' | grep -q "^${container}$"; then
|
||||
local status=$(docker inspect --format='{{.State.Status}}' "$container")
|
||||
if [ "$status" = "running" ]; then
|
||||
echo "✅ $service: Running"
|
||||
return 0
|
||||
else
|
||||
echo "⚠️ $service: Container exists but not running (status: $status)"
|
||||
return 1
|
||||
fi
|
||||
else
|
||||
echo "❌ $service: Container not found"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# Check all containers
|
||||
ALL_OK=true
|
||||
|
||||
check_container "$NGINX_CONTAINER" "nginx" || ALL_OK=false
|
||||
check_container "$FRONTEND_CONTAINER" "Frontend" || ALL_OK=false
|
||||
check_container "$BACKEND_CONTAINER" "Backend" || ALL_OK=false
|
||||
check_container "$DB_CONTAINER" "Database" || ALL_OK=false
|
||||
|
||||
echo ""
|
||||
|
||||
# Check API health endpoint
|
||||
echo "🔌 API Health Check:"
|
||||
if curl -f -s "$API_URL" > /dev/null 2>&1; then
|
||||
echo "✅ API responding at $API_URL"
|
||||
else
|
||||
echo "❌ API not responding at $API_URL"
|
||||
ALL_OK=false
|
||||
fi
|
||||
|
||||
echo ""
|
||||
|
||||
# Database connection test
|
||||
echo "🗄️ Database Connection:"
|
||||
if docker exec "$DB_CONTAINER" pg_isready -U spotlightcam > /dev/null 2>&1; then
|
||||
echo "✅ Database accepting connections"
|
||||
else
|
||||
echo "❌ Database not accepting connections"
|
||||
ALL_OK=false
|
||||
fi
|
||||
|
||||
echo ""
|
||||
echo "================================"
|
||||
|
||||
if [ "$ALL_OK" = true ]; then
|
||||
echo "✅ All systems operational!"
|
||||
exit 0
|
||||
else
|
||||
echo "⚠️ Some services are not healthy"
|
||||
exit 1
|
||||
fi
|
||||
65
scripts/restore-db.sh
Normal file
65
scripts/restore-db.sh
Normal file
@@ -0,0 +1,65 @@
|
||||
#!/bin/bash
|
||||
# Database restore script for spotlight.cam
|
||||
# Usage: ./scripts/restore-db.sh <backup-file> [dev|prod]
|
||||
|
||||
set -e
|
||||
|
||||
# Check if backup file is provided
|
||||
if [ -z "$1" ]; then
|
||||
echo "❌ Error: Backup file not specified"
|
||||
echo "Usage: ./scripts/restore-db.sh <backup-file> [dev|prod]"
|
||||
echo "Example: ./scripts/restore-db.sh ./backups/backup_dev_20251120_120000.sql dev"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
BACKUP_FILE=$1
|
||||
ENV=${2:-dev}
|
||||
|
||||
# Check if backup file exists
|
||||
if [ ! -f "$BACKUP_FILE" ]; then
|
||||
echo "❌ Error: Backup file not found: $BACKUP_FILE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Set container name based on environment
|
||||
if [ "$ENV" = "prod" ]; then
|
||||
DB_CONTAINER="slc-db-prod"
|
||||
DB_NAME="spotlightcam"
|
||||
else
|
||||
DB_CONTAINER="slc-db"
|
||||
DB_NAME="spotlightcam"
|
||||
fi
|
||||
|
||||
echo "⚠️ WARNING: This will REPLACE the current database!"
|
||||
echo "📦 Environment: $ENV"
|
||||
echo "🗄️ Container: $DB_CONTAINER"
|
||||
echo "💾 Backup file: $BACKUP_FILE"
|
||||
echo ""
|
||||
read -p "Are you sure you want to continue? (yes/no): " -r
|
||||
echo
|
||||
|
||||
if [[ ! $REPLY =~ ^[Yy][Ee][Ss]$ ]]; then
|
||||
echo "❌ Restore cancelled"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# Check if container is running
|
||||
if ! docker ps --format '{{.Names}}' | grep -q "^${DB_CONTAINER}$"; then
|
||||
echo "❌ Error: Container $DB_CONTAINER is not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "🔄 Starting database restore..."
|
||||
|
||||
# Restore backup
|
||||
cat "$BACKUP_FILE" | docker exec -i "$DB_CONTAINER" psql -U spotlightcam "$DB_NAME"
|
||||
|
||||
# Check if restore was successful
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "✅ Restore completed successfully!"
|
||||
else
|
||||
echo "❌ Restore failed!"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✨ Done!"
|
||||
Reference in New Issue
Block a user