feat: add production operations scripts and monitoring guide
Add comprehensive tooling for production deployment: Scripts (scripts/): - backup-db.sh: Automated database backups with 7-day retention - restore-db.sh: Safe database restore with confirmation prompts - health-check.sh: Complete service health monitoring - README.md: Operational scripts documentation Monitoring (docs/MONITORING.md): - Application health monitoring - Docker container monitoring - External monitoring setup (UptimeRobot, Pingdom) - Log monitoring and rotation - Alerting configuration - Incident response procedures - SLA targets and metrics All scripts include: - Environment support (dev/prod) - Error handling and validation - Detailed status reporting - Safety confirmations where needed
This commit is contained in:
427
docs/MONITORING.md
Normal file
427
docs/MONITORING.md
Normal file
@@ -0,0 +1,427 @@
|
|||||||
|
# Monitoring Guide - spotlight.cam
|
||||||
|
|
||||||
|
Complete guide for monitoring spotlight.cam in production.
|
||||||
|
|
||||||
|
## 📊 Monitoring Strategy
|
||||||
|
|
||||||
|
### Three-Layer Approach
|
||||||
|
|
||||||
|
1. **Application Monitoring** - Health checks, logs, metrics
|
||||||
|
2. **Infrastructure Monitoring** - Docker containers, system resources
|
||||||
|
3. **External Monitoring** - Uptime, response times, SSL certificates
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏥 Application Monitoring
|
||||||
|
|
||||||
|
### Built-in Health Check
|
||||||
|
|
||||||
|
**Endpoint:** `GET /api/health`
|
||||||
|
|
||||||
|
**Response (healthy):**
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"status": "ok",
|
||||||
|
"timestamp": "2025-11-20T12:00:00.000Z",
|
||||||
|
"uptime": 3600,
|
||||||
|
"environment": "production"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Usage:**
|
||||||
|
```bash
|
||||||
|
# Check health
|
||||||
|
curl https://spotlight.cam/api/health
|
||||||
|
|
||||||
|
# Automated check (exit code 0 = healthy)
|
||||||
|
curl -f -s https://spotlight.cam/api/health > /dev/null
|
||||||
|
```
|
||||||
|
|
||||||
|
### Health Check Script
|
||||||
|
|
||||||
|
Use built-in health check script:
|
||||||
|
```bash
|
||||||
|
# Check all services
|
||||||
|
./scripts/health-check.sh prod
|
||||||
|
|
||||||
|
# Output:
|
||||||
|
# ✅ nginx: Running
|
||||||
|
# ✅ Frontend: Running
|
||||||
|
# ✅ Backend: Running
|
||||||
|
# ✅ Database: Running
|
||||||
|
# ✅ API responding
|
||||||
|
# ✅ Database accepting connections
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🐳 Docker Container Monitoring
|
||||||
|
|
||||||
|
### Check Container Status
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List all containers
|
||||||
|
docker compose --profile prod ps
|
||||||
|
|
||||||
|
# Check specific container
|
||||||
|
docker inspect slc-backend-prod --format='{{.State.Status}}'
|
||||||
|
|
||||||
|
# View resource usage
|
||||||
|
docker stats --no-stream
|
||||||
|
```
|
||||||
|
|
||||||
|
### Container Health Checks
|
||||||
|
|
||||||
|
Built into docker-compose.yml:
|
||||||
|
- **Backend:** `curl localhost:3000/api/health`
|
||||||
|
- **Database:** `pg_isready -U spotlightcam`
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View health status
|
||||||
|
docker compose --profile prod ps
|
||||||
|
# Look for "(healthy)" in STATUS column
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Log Monitoring
|
||||||
|
|
||||||
|
### View Logs
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# All services
|
||||||
|
docker compose --profile prod logs -f
|
||||||
|
|
||||||
|
# Specific service
|
||||||
|
docker logs -f slc-backend-prod
|
||||||
|
|
||||||
|
# Last 100 lines
|
||||||
|
docker logs --tail 100 slc-backend-prod
|
||||||
|
|
||||||
|
# With timestamps
|
||||||
|
docker logs -f --timestamps slc-backend-prod
|
||||||
|
|
||||||
|
# Filter errors only
|
||||||
|
docker logs slc-backend-prod 2>&1 | grep -i error
|
||||||
|
```
|
||||||
|
|
||||||
|
### Log Rotation
|
||||||
|
|
||||||
|
Configured in docker-compose.yml:
|
||||||
|
```yaml
|
||||||
|
logging:
|
||||||
|
driver: "json-file"
|
||||||
|
options:
|
||||||
|
max-size: "10m"
|
||||||
|
max-file: "3"
|
||||||
|
```
|
||||||
|
|
||||||
|
### Important Log Patterns
|
||||||
|
|
||||||
|
**Authentication errors:**
|
||||||
|
```bash
|
||||||
|
docker logs slc-backend-prod | grep "401\|403\|locked"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Database errors:**
|
||||||
|
```bash
|
||||||
|
docker logs slc-backend-prod | grep -i "prisma\|database"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rate limiting:**
|
||||||
|
```bash
|
||||||
|
docker logs slc-backend-prod | grep "Too many requests"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Email failures:**
|
||||||
|
```bash
|
||||||
|
docker logs slc-backend-prod | grep "Failed to send.*email"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🌐 External Monitoring
|
||||||
|
|
||||||
|
### Recommended Services
|
||||||
|
|
||||||
|
#### 1. UptimeRobot (Free)
|
||||||
|
- **URL:** https://uptimerobot.com
|
||||||
|
- **Features:**
|
||||||
|
- 5-minute checks
|
||||||
|
- Email/SMS alerts
|
||||||
|
- 50 monitors free
|
||||||
|
- Status pages
|
||||||
|
|
||||||
|
**Setup:**
|
||||||
|
1. Create account
|
||||||
|
2. Add HTTP monitor: `https://spotlight.cam`
|
||||||
|
3. Add HTTP monitor: `https://spotlight.cam/api/health`
|
||||||
|
4. Set alert contacts
|
||||||
|
5. Create public status page (optional)
|
||||||
|
|
||||||
|
#### 2. Pingdom
|
||||||
|
- **URL:** https://pingdom.com
|
||||||
|
- **Features:**
|
||||||
|
- 1-minute checks
|
||||||
|
- Transaction monitoring
|
||||||
|
- Real user monitoring
|
||||||
|
- SSL monitoring
|
||||||
|
|
||||||
|
#### 3. Better Uptime
|
||||||
|
- **URL:** https://betteruptime.com
|
||||||
|
- **Features:**
|
||||||
|
- Free tier available
|
||||||
|
- Incident management
|
||||||
|
- On-call scheduling
|
||||||
|
- Status pages
|
||||||
|
|
||||||
|
### Monitor These Endpoints
|
||||||
|
|
||||||
|
| Endpoint | Check Type | Expected |
|
||||||
|
|----------|-----------|----------|
|
||||||
|
| `https://spotlight.cam` | HTTP | 200 OK |
|
||||||
|
| `https://spotlight.cam/api/health` | HTTP + JSON | `{"status":"ok"}` |
|
||||||
|
| `spotlight.cam` | SSL | Valid, not expiring |
|
||||||
|
| `spotlight.cam` | DNS | Resolves correctly |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📈 Metrics to Track
|
||||||
|
|
||||||
|
### Application Metrics
|
||||||
|
|
||||||
|
1. **Response Times**
|
||||||
|
- API endpoints: < 200ms
|
||||||
|
- Frontend load: < 1s
|
||||||
|
|
||||||
|
2. **Error Rates**
|
||||||
|
- 4xx errors: < 1%
|
||||||
|
- 5xx errors: < 0.1%
|
||||||
|
|
||||||
|
3. **Authentication**
|
||||||
|
- Failed logins
|
||||||
|
- Account lockouts
|
||||||
|
- Password resets
|
||||||
|
|
||||||
|
4. **WebRTC**
|
||||||
|
- Connection success rate
|
||||||
|
- File transfer completions
|
||||||
|
- Peer connection failures
|
||||||
|
|
||||||
|
### Infrastructure Metrics
|
||||||
|
|
||||||
|
1. **CPU Usage**
|
||||||
|
```bash
|
||||||
|
docker stats --no-stream --format "table {{.Name}}\t{{.CPUPerc}}"
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Memory Usage**
|
||||||
|
```bash
|
||||||
|
docker stats --no-stream --format "table {{.Name}}\t{{.MemUsage}}"
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Disk Space**
|
||||||
|
```bash
|
||||||
|
df -h
|
||||||
|
du -sh /var/lib/docker
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Database Size**
|
||||||
|
```bash
|
||||||
|
docker exec slc-db-prod psql -U spotlightcam -c "SELECT pg_size_pretty(pg_database_size('spotlightcam'));"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚨 Alerting Setup
|
||||||
|
|
||||||
|
### Email Alerts (Simple)
|
||||||
|
|
||||||
|
Create alert script:
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# /usr/local/bin/alert-spotlight.sh
|
||||||
|
|
||||||
|
SUBJECT="⚠️ spotlight.cam Alert"
|
||||||
|
RECIPIENT="admin@example.com"
|
||||||
|
|
||||||
|
# Run health check
|
||||||
|
if ! /path/to/spotlightcam/scripts/health-check.sh prod; then
|
||||||
|
echo "Health check failed at $(date)" | mail -s "$SUBJECT" "$RECIPIENT"
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
Add to crontab:
|
||||||
|
```bash
|
||||||
|
*/5 * * * * /usr/local/bin/alert-spotlight.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Slack Alerts (Advanced)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# /usr/local/bin/alert-slack.sh
|
||||||
|
|
||||||
|
SLACK_WEBHOOK="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
|
||||||
|
|
||||||
|
if ! /path/to/spotlightcam/scripts/health-check.sh prod; then
|
||||||
|
curl -X POST "$SLACK_WEBHOOK" \
|
||||||
|
-H 'Content-Type: application/json' \
|
||||||
|
-d '{
|
||||||
|
"text": "🚨 spotlight.cam health check failed",
|
||||||
|
"username": "Monitoring Bot"
|
||||||
|
}'
|
||||||
|
fi
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 Dashboard (Optional)
|
||||||
|
|
||||||
|
### Simple Dashboard with Grafana
|
||||||
|
|
||||||
|
1. **Setup Prometheus:**
|
||||||
|
```yaml
|
||||||
|
# docker-compose.monitoring.yml
|
||||||
|
services:
|
||||||
|
prometheus:
|
||||||
|
image: prom/prometheus
|
||||||
|
volumes:
|
||||||
|
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
|
||||||
|
ports:
|
||||||
|
- "9090:9090"
|
||||||
|
|
||||||
|
grafana:
|
||||||
|
image: grafana/grafana
|
||||||
|
ports:
|
||||||
|
- "3001:3000"
|
||||||
|
volumes:
|
||||||
|
- grafana_data:/var/lib/grafana
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Add metrics endpoint to backend** (optional enhancement)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔍 Troubleshooting Monitoring
|
||||||
|
|
||||||
|
### Health Check Always Fails
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Test API manually
|
||||||
|
curl -v https://spotlight.cam/api/health
|
||||||
|
|
||||||
|
# Check nginx logs
|
||||||
|
docker logs slc-proxy-prod
|
||||||
|
|
||||||
|
# Check backend logs
|
||||||
|
docker logs slc-backend-prod
|
||||||
|
|
||||||
|
# Test from within container
|
||||||
|
docker exec slc-proxy-prod curl localhost:80/api/health
|
||||||
|
```
|
||||||
|
|
||||||
|
### High CPU/Memory Usage
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Identify problematic container
|
||||||
|
docker stats --no-stream
|
||||||
|
|
||||||
|
# Check container logs
|
||||||
|
docker logs --tail 100 slc-backend-prod
|
||||||
|
|
||||||
|
# Restart if needed
|
||||||
|
docker compose --profile prod restart backend-prod
|
||||||
|
```
|
||||||
|
|
||||||
|
### Logs Not Rotating
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check Docker log files
|
||||||
|
ls -lh /var/lib/docker/containers/*/*-json.log
|
||||||
|
|
||||||
|
# Manual cleanup (careful!)
|
||||||
|
docker compose --profile prod down
|
||||||
|
docker system prune -af
|
||||||
|
docker compose --profile prod up -d
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ✅ Monitoring Checklist
|
||||||
|
|
||||||
|
### Daily Checks (Automated)
|
||||||
|
- [ ] Health check endpoint responding
|
||||||
|
- [ ] All containers running
|
||||||
|
- [ ] Database accepting connections
|
||||||
|
- [ ] No critical errors in logs
|
||||||
|
|
||||||
|
### Weekly Checks (Manual)
|
||||||
|
- [ ] Review error logs
|
||||||
|
- [ ] Check disk space
|
||||||
|
- [ ] Verify backups are running
|
||||||
|
- [ ] Test restore from backup
|
||||||
|
- [ ] Review failed login attempts
|
||||||
|
|
||||||
|
### Monthly Checks
|
||||||
|
- [ ] SSL certificate expiry (renew if < 30 days)
|
||||||
|
- [ ] Update dependencies
|
||||||
|
- [ ] Review and rotate secrets
|
||||||
|
- [ ] Performance review
|
||||||
|
- [ ] Security audit
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📞 Incident Response
|
||||||
|
|
||||||
|
### When Alert Triggers
|
||||||
|
|
||||||
|
1. **Check severity**
|
||||||
|
```bash
|
||||||
|
./scripts/health-check.sh prod
|
||||||
|
docker compose --profile prod ps
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Check logs**
|
||||||
|
```bash
|
||||||
|
docker logs --tail 100 slc-backend-prod
|
||||||
|
docker logs --tail 100 slc-db-prod
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Attempt automatic recovery**
|
||||||
|
```bash
|
||||||
|
docker compose --profile prod restart
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **If still down, investigate**
|
||||||
|
- Database connection issues
|
||||||
|
- Disk space full
|
||||||
|
- Memory exhaustion
|
||||||
|
- Network issues
|
||||||
|
|
||||||
|
5. **Document incident**
|
||||||
|
- Time of failure
|
||||||
|
- Symptoms observed
|
||||||
|
- Actions taken
|
||||||
|
- Resolution
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🎯 SLA Targets
|
||||||
|
|
||||||
|
### Uptime
|
||||||
|
- **Target:** 99.9% (43 minutes downtime/month)
|
||||||
|
- **Measurement:** External monitoring (UptimeRobot)
|
||||||
|
|
||||||
|
### Performance
|
||||||
|
- **API Response:** < 200ms (95th percentile)
|
||||||
|
- **Page Load:** < 2s (95th percentile)
|
||||||
|
|
||||||
|
### Recovery
|
||||||
|
- **Detection:** < 5 minutes
|
||||||
|
- **Response:** < 15 minutes
|
||||||
|
- **Resolution:** < 1 hour (non-critical)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** 2025-11-20
|
||||||
186
scripts/README.md
Normal file
186
scripts/README.md
Normal file
@@ -0,0 +1,186 @@
|
|||||||
|
# spotlight.cam - Operations Scripts
|
||||||
|
|
||||||
|
Utility scripts for managing spotlight.cam deployment, backups, and monitoring.
|
||||||
|
|
||||||
|
## 📋 Available Scripts
|
||||||
|
|
||||||
|
### 1. Database Backup (`backup-db.sh`)
|
||||||
|
|
||||||
|
Creates a timestamped backup of the PostgreSQL database.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Backup development database
|
||||||
|
./scripts/backup-db.sh dev
|
||||||
|
|
||||||
|
# Backup production database
|
||||||
|
./scripts/backup-db.sh prod
|
||||||
|
```
|
||||||
|
|
||||||
|
**Features:**
|
||||||
|
- Automatic timestamping
|
||||||
|
- Automatic cleanup (keeps last 7 days)
|
||||||
|
- Backup size reporting
|
||||||
|
- Error handling
|
||||||
|
|
||||||
|
**Backups location:** `./backups/`
|
||||||
|
|
||||||
|
**Setup cron job for automatic backups:**
|
||||||
|
```bash
|
||||||
|
# Edit crontab
|
||||||
|
crontab -e
|
||||||
|
|
||||||
|
# Add daily backup at 2 AM (production)
|
||||||
|
0 2 * * * cd /path/to/spotlightcam && ./scripts/backup-db.sh prod >> logs/backup.log 2>&1
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. Database Restore (`restore-db.sh`)
|
||||||
|
|
||||||
|
Restores database from a backup file.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Restore development database
|
||||||
|
./scripts/restore-db.sh ./backups/backup_dev_20251120_120000.sql dev
|
||||||
|
|
||||||
|
# Restore production database
|
||||||
|
./scripts/restore-db.sh ./backups/backup_prod_20251120_120000.sql prod
|
||||||
|
```
|
||||||
|
|
||||||
|
**Safety features:**
|
||||||
|
- Confirmation prompt
|
||||||
|
- File existence check
|
||||||
|
- Container status validation
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Health Check (`health-check.sh`)
|
||||||
|
|
||||||
|
Monitors all services and checks system health.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check development environment
|
||||||
|
./scripts/health-check.sh dev
|
||||||
|
|
||||||
|
# Check production environment
|
||||||
|
./scripts/health-check.sh prod
|
||||||
|
```
|
||||||
|
|
||||||
|
**Checks:**
|
||||||
|
- ✅ nginx container status
|
||||||
|
- ✅ Frontend container status
|
||||||
|
- ✅ Backend container status
|
||||||
|
- ✅ Database container status
|
||||||
|
- ✅ API health endpoint
|
||||||
|
- ✅ Database connection
|
||||||
|
|
||||||
|
**Exit codes:**
|
||||||
|
- `0` - All systems operational
|
||||||
|
- `1` - One or more services unhealthy
|
||||||
|
|
||||||
|
**Setup monitoring:**
|
||||||
|
```bash
|
||||||
|
# Check every 5 minutes and send alerts on failure
|
||||||
|
*/5 * * * * /path/to/spotlightcam/scripts/health-check.sh prod || /path/to/alert-script.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🔧 Setup Instructions
|
||||||
|
|
||||||
|
### Make scripts executable
|
||||||
|
```bash
|
||||||
|
chmod +x scripts/*.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Create backups directory
|
||||||
|
```bash
|
||||||
|
mkdir -p backups
|
||||||
|
```
|
||||||
|
|
||||||
|
### Test scripts
|
||||||
|
```bash
|
||||||
|
# Test backup
|
||||||
|
./scripts/backup-db.sh dev
|
||||||
|
|
||||||
|
# Test health check
|
||||||
|
./scripts/health-check.sh dev
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📊 Monitoring Setup
|
||||||
|
|
||||||
|
### Option 1: Cron-based monitoring
|
||||||
|
```bash
|
||||||
|
# Add to crontab
|
||||||
|
crontab -e
|
||||||
|
|
||||||
|
# Health check every 5 minutes
|
||||||
|
*/5 * * * * /path/to/spotlightcam/scripts/health-check.sh prod || echo "Health check failed" | mail -s "spotlight.cam Alert" admin@example.com
|
||||||
|
|
||||||
|
# Daily backup at 2 AM
|
||||||
|
0 2 * * * /path/to/spotlightcam/scripts/backup-db.sh prod
|
||||||
|
```
|
||||||
|
|
||||||
|
### Option 2: External monitoring
|
||||||
|
Recommended external services:
|
||||||
|
- **UptimeRobot** - Free, checks every 5 min
|
||||||
|
- **Pingdom** - Advanced monitoring
|
||||||
|
- **StatusCake** - Free tier available
|
||||||
|
|
||||||
|
Monitor these endpoints:
|
||||||
|
- `https://spotlight.cam` - Frontend
|
||||||
|
- `https://spotlight.cam/api/health` - Backend API
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🚨 Troubleshooting
|
||||||
|
|
||||||
|
### Backup fails
|
||||||
|
```bash
|
||||||
|
# Check if database container is running
|
||||||
|
docker ps | grep slc-db
|
||||||
|
|
||||||
|
# Check database logs
|
||||||
|
docker logs slc-db-prod
|
||||||
|
|
||||||
|
# Manually test connection
|
||||||
|
docker exec slc-db-prod pg_isready -U spotlightcam
|
||||||
|
```
|
||||||
|
|
||||||
|
### Health check fails
|
||||||
|
```bash
|
||||||
|
# View detailed container status
|
||||||
|
docker ps -a
|
||||||
|
|
||||||
|
# Check specific service logs
|
||||||
|
docker logs slc-backend-prod
|
||||||
|
docker logs slc-db-prod
|
||||||
|
|
||||||
|
# Restart services
|
||||||
|
docker compose --profile prod restart
|
||||||
|
```
|
||||||
|
|
||||||
|
### Restore fails
|
||||||
|
```bash
|
||||||
|
# Check backup file integrity
|
||||||
|
head -n 10 ./backups/backup_prod_20251120_120000.sql
|
||||||
|
|
||||||
|
# Verify database is accepting connections
|
||||||
|
docker exec slc-db-prod psql -U spotlightcam -c "SELECT version();"
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 📝 Best Practices
|
||||||
|
|
||||||
|
1. **Always test backups** - Regularly test restore process in dev environment
|
||||||
|
2. **Monitor disk space** - Ensure backups directory has enough space
|
||||||
|
3. **Secure backups** - Store backups off-server (AWS S3, Backblaze B2)
|
||||||
|
4. **Regular testing** - Test health checks and disaster recovery procedures
|
||||||
|
5. **Log rotation** - Use logrotate for script logs
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
**Last Updated:** 2025-11-20
|
||||||
61
scripts/backup-db.sh
Normal file
61
scripts/backup-db.sh
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Database backup script for spotlight.cam
|
||||||
|
# Usage: ./scripts/backup-db.sh [dev|prod]
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Default to development if no argument provided
|
||||||
|
ENV=${1:-dev}
|
||||||
|
|
||||||
|
# Configuration
|
||||||
|
DATE=$(date +%Y%m%d_%H%M%S)
|
||||||
|
BACKUP_DIR="./backups"
|
||||||
|
|
||||||
|
# Create backup directory if it doesn't exist
|
||||||
|
mkdir -p "$BACKUP_DIR"
|
||||||
|
|
||||||
|
# Set container name based on environment
|
||||||
|
if [ "$ENV" = "prod" ]; then
|
||||||
|
DB_CONTAINER="slc-db-prod"
|
||||||
|
DB_NAME="spotlightcam"
|
||||||
|
BACKUP_FILE="$BACKUP_DIR/backup_prod_$DATE.sql"
|
||||||
|
else
|
||||||
|
DB_CONTAINER="slc-db"
|
||||||
|
DB_NAME="spotlightcam"
|
||||||
|
BACKUP_FILE="$BACKUP_DIR/backup_dev_$DATE.sql"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "🔄 Starting database backup..."
|
||||||
|
echo "📦 Environment: $ENV"
|
||||||
|
echo "🗄️ Container: $DB_CONTAINER"
|
||||||
|
echo "💾 Backup file: $BACKUP_FILE"
|
||||||
|
|
||||||
|
# Check if container is running
|
||||||
|
if ! docker ps --format '{{.Names}}' | grep -q "^${DB_CONTAINER}$"; then
|
||||||
|
echo "❌ Error: Container $DB_CONTAINER is not running"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Create backup
|
||||||
|
docker exec "$DB_CONTAINER" pg_dump -U spotlightcam "$DB_NAME" > "$BACKUP_FILE"
|
||||||
|
|
||||||
|
# Check if backup was successful
|
||||||
|
if [ $? -eq 0 ]; then
|
||||||
|
BACKUP_SIZE=$(du -h "$BACKUP_FILE" | cut -f1)
|
||||||
|
echo "✅ Backup completed successfully!"
|
||||||
|
echo "📊 Backup size: $BACKUP_SIZE"
|
||||||
|
echo "📁 Location: $BACKUP_FILE"
|
||||||
|
else
|
||||||
|
echo "❌ Backup failed!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Keep only last 7 days of backups
|
||||||
|
echo "🧹 Cleaning old backups (keeping last 7 days)..."
|
||||||
|
find "$BACKUP_DIR" -name "backup_*.sql" -mtime +7 -delete
|
||||||
|
|
||||||
|
# Count remaining backups
|
||||||
|
BACKUP_COUNT=$(find "$BACKUP_DIR" -name "backup_*.sql" | wc -l)
|
||||||
|
echo "📚 Total backups: $BACKUP_COUNT"
|
||||||
|
|
||||||
|
echo "✨ Done!"
|
||||||
88
scripts/health-check.sh
Normal file
88
scripts/health-check.sh
Normal file
@@ -0,0 +1,88 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Health check script for spotlight.cam
|
||||||
|
# Usage: ./scripts/health-check.sh [dev|prod]
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
ENV=${1:-dev}
|
||||||
|
|
||||||
|
# Set service names based on environment
|
||||||
|
if [ "$ENV" = "prod" ]; then
|
||||||
|
NGINX_CONTAINER="slc-proxy-prod"
|
||||||
|
FRONTEND_CONTAINER="slc-frontend-prod"
|
||||||
|
BACKEND_CONTAINER="slc-backend-prod"
|
||||||
|
DB_CONTAINER="slc-db-prod"
|
||||||
|
API_URL="https://spotlight.cam/api/health"
|
||||||
|
else
|
||||||
|
NGINX_CONTAINER="slc-proxy"
|
||||||
|
FRONTEND_CONTAINER="slc-frontend"
|
||||||
|
BACKEND_CONTAINER="slc-backend"
|
||||||
|
DB_CONTAINER="slc-db"
|
||||||
|
API_URL="http://localhost:8080/api/health"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "🏥 spotlight.cam Health Check"
|
||||||
|
echo "📦 Environment: $ENV"
|
||||||
|
echo "================================"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Function to check container status
|
||||||
|
check_container() {
|
||||||
|
local container=$1
|
||||||
|
local service=$2
|
||||||
|
|
||||||
|
if docker ps --format '{{.Names}}' | grep -q "^${container}$"; then
|
||||||
|
local status=$(docker inspect --format='{{.State.Status}}' "$container")
|
||||||
|
if [ "$status" = "running" ]; then
|
||||||
|
echo "✅ $service: Running"
|
||||||
|
return 0
|
||||||
|
else
|
||||||
|
echo "⚠️ $service: Container exists but not running (status: $status)"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
else
|
||||||
|
echo "❌ $service: Container not found"
|
||||||
|
return 1
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# Check all containers
|
||||||
|
ALL_OK=true
|
||||||
|
|
||||||
|
check_container "$NGINX_CONTAINER" "nginx" || ALL_OK=false
|
||||||
|
check_container "$FRONTEND_CONTAINER" "Frontend" || ALL_OK=false
|
||||||
|
check_container "$BACKEND_CONTAINER" "Backend" || ALL_OK=false
|
||||||
|
check_container "$DB_CONTAINER" "Database" || ALL_OK=false
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Check API health endpoint
|
||||||
|
echo "🔌 API Health Check:"
|
||||||
|
if curl -f -s "$API_URL" > /dev/null 2>&1; then
|
||||||
|
echo "✅ API responding at $API_URL"
|
||||||
|
else
|
||||||
|
echo "❌ API not responding at $API_URL"
|
||||||
|
ALL_OK=false
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
# Database connection test
|
||||||
|
echo "🗄️ Database Connection:"
|
||||||
|
if docker exec "$DB_CONTAINER" pg_isready -U spotlightcam > /dev/null 2>&1; then
|
||||||
|
echo "✅ Database accepting connections"
|
||||||
|
else
|
||||||
|
echo "❌ Database not accepting connections"
|
||||||
|
ALL_OK=false
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "================================"
|
||||||
|
|
||||||
|
if [ "$ALL_OK" = true ]; then
|
||||||
|
echo "✅ All systems operational!"
|
||||||
|
exit 0
|
||||||
|
else
|
||||||
|
echo "⚠️ Some services are not healthy"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
65
scripts/restore-db.sh
Normal file
65
scripts/restore-db.sh
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Database restore script for spotlight.cam
|
||||||
|
# Usage: ./scripts/restore-db.sh <backup-file> [dev|prod]
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Check if backup file is provided
|
||||||
|
if [ -z "$1" ]; then
|
||||||
|
echo "❌ Error: Backup file not specified"
|
||||||
|
echo "Usage: ./scripts/restore-db.sh <backup-file> [dev|prod]"
|
||||||
|
echo "Example: ./scripts/restore-db.sh ./backups/backup_dev_20251120_120000.sql dev"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
BACKUP_FILE=$1
|
||||||
|
ENV=${2:-dev}
|
||||||
|
|
||||||
|
# Check if backup file exists
|
||||||
|
if [ ! -f "$BACKUP_FILE" ]; then
|
||||||
|
echo "❌ Error: Backup file not found: $BACKUP_FILE"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Set container name based on environment
|
||||||
|
if [ "$ENV" = "prod" ]; then
|
||||||
|
DB_CONTAINER="slc-db-prod"
|
||||||
|
DB_NAME="spotlightcam"
|
||||||
|
else
|
||||||
|
DB_CONTAINER="slc-db"
|
||||||
|
DB_NAME="spotlightcam"
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "⚠️ WARNING: This will REPLACE the current database!"
|
||||||
|
echo "📦 Environment: $ENV"
|
||||||
|
echo "🗄️ Container: $DB_CONTAINER"
|
||||||
|
echo "💾 Backup file: $BACKUP_FILE"
|
||||||
|
echo ""
|
||||||
|
read -p "Are you sure you want to continue? (yes/no): " -r
|
||||||
|
echo
|
||||||
|
|
||||||
|
if [[ ! $REPLY =~ ^[Yy][Ee][Ss]$ ]]; then
|
||||||
|
echo "❌ Restore cancelled"
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check if container is running
|
||||||
|
if ! docker ps --format '{{.Names}}' | grep -q "^${DB_CONTAINER}$"; then
|
||||||
|
echo "❌ Error: Container $DB_CONTAINER is not running"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "🔄 Starting database restore..."
|
||||||
|
|
||||||
|
# Restore backup
|
||||||
|
cat "$BACKUP_FILE" | docker exec -i "$DB_CONTAINER" psql -U spotlightcam "$DB_NAME"
|
||||||
|
|
||||||
|
# Check if restore was successful
|
||||||
|
if [ $? -eq 0 ]; then
|
||||||
|
echo "✅ Restore completed successfully!"
|
||||||
|
else
|
||||||
|
echo "❌ Restore failed!"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "✨ Done!"
|
||||||
Reference in New Issue
Block a user