feat: add production operations scripts and monitoring guide

Add comprehensive tooling for production deployment:

Scripts (scripts/):
- backup-db.sh: Automated database backups with 7-day retention
- restore-db.sh: Safe database restore with confirmation prompts
- health-check.sh: Complete service health monitoring
- README.md: Operational scripts documentation

Monitoring (docs/MONITORING.md):
- Application health monitoring
- Docker container monitoring
- External monitoring setup (UptimeRobot, Pingdom)
- Log monitoring and rotation
- Alerting configuration
- Incident response procedures
- SLA targets and metrics

All scripts include:
- Environment support (dev/prod)
- Error handling and validation
- Detailed status reporting
- Safety confirmations where needed
This commit is contained in:
Radosław Gierwiało
2025-11-20 22:22:22 +01:00
parent 2e194e1640
commit 642c8f6d6f
5 changed files with 827 additions and 0 deletions

186
scripts/README.md Normal file
View File

@@ -0,0 +1,186 @@
# spotlight.cam - Operations Scripts
Utility scripts for managing spotlight.cam deployment, backups, and monitoring.
## 📋 Available Scripts
### 1. Database Backup (`backup-db.sh`)
Creates a timestamped backup of the PostgreSQL database.
```bash
# Backup development database
./scripts/backup-db.sh dev
# Backup production database
./scripts/backup-db.sh prod
```
**Features:**
- Automatic timestamping
- Automatic cleanup (keeps last 7 days)
- Backup size reporting
- Error handling
**Backups location:** `./backups/`
**Setup cron job for automatic backups:**
```bash
# Edit crontab
crontab -e
# Add daily backup at 2 AM (production)
0 2 * * * cd /path/to/spotlightcam && ./scripts/backup-db.sh prod >> logs/backup.log 2>&1
```
---
### 2. Database Restore (`restore-db.sh`)
Restores database from a backup file.
```bash
# Restore development database
./scripts/restore-db.sh ./backups/backup_dev_20251120_120000.sql dev
# Restore production database
./scripts/restore-db.sh ./backups/backup_prod_20251120_120000.sql prod
```
**Safety features:**
- Confirmation prompt
- File existence check
- Container status validation
---
### 3. Health Check (`health-check.sh`)
Monitors all services and checks system health.
```bash
# Check development environment
./scripts/health-check.sh dev
# Check production environment
./scripts/health-check.sh prod
```
**Checks:**
- ✅ nginx container status
- ✅ Frontend container status
- ✅ Backend container status
- ✅ Database container status
- ✅ API health endpoint
- ✅ Database connection
**Exit codes:**
- `0` - All systems operational
- `1` - One or more services unhealthy
**Setup monitoring:**
```bash
# Check every 5 minutes and send alerts on failure
*/5 * * * * /path/to/spotlightcam/scripts/health-check.sh prod || /path/to/alert-script.sh
```
---
## 🔧 Setup Instructions
### Make scripts executable
```bash
chmod +x scripts/*.sh
```
### Create backups directory
```bash
mkdir -p backups
```
### Test scripts
```bash
# Test backup
./scripts/backup-db.sh dev
# Test health check
./scripts/health-check.sh dev
```
---
## 📊 Monitoring Setup
### Option 1: Cron-based monitoring
```bash
# Add to crontab
crontab -e
# Health check every 5 minutes
*/5 * * * * /path/to/spotlightcam/scripts/health-check.sh prod || echo "Health check failed" | mail -s "spotlight.cam Alert" admin@example.com
# Daily backup at 2 AM
0 2 * * * /path/to/spotlightcam/scripts/backup-db.sh prod
```
### Option 2: External monitoring
Recommended external services:
- **UptimeRobot** - Free, checks every 5 min
- **Pingdom** - Advanced monitoring
- **StatusCake** - Free tier available
Monitor these endpoints:
- `https://spotlight.cam` - Frontend
- `https://spotlight.cam/api/health` - Backend API
---
## 🚨 Troubleshooting
### Backup fails
```bash
# Check if database container is running
docker ps | grep slc-db
# Check database logs
docker logs slc-db-prod
# Manually test connection
docker exec slc-db-prod pg_isready -U spotlightcam
```
### Health check fails
```bash
# View detailed container status
docker ps -a
# Check specific service logs
docker logs slc-backend-prod
docker logs slc-db-prod
# Restart services
docker compose --profile prod restart
```
### Restore fails
```bash
# Check backup file integrity
head -n 10 ./backups/backup_prod_20251120_120000.sql
# Verify database is accepting connections
docker exec slc-db-prod psql -U spotlightcam -c "SELECT version();"
```
---
## 📝 Best Practices
1. **Always test backups** - Regularly test restore process in dev environment
2. **Monitor disk space** - Ensure backups directory has enough space
3. **Secure backups** - Store backups off-server (AWS S3, Backblaze B2)
4. **Regular testing** - Test health checks and disaster recovery procedures
5. **Log rotation** - Use logrotate for script logs
---
**Last Updated:** 2025-11-20