docs: update documentation with matching runs audit and complete test coverage

- Update README.md with current test statistics (342/342 tests passing)
- Add detailed breakdown of all matching/ratings test suites
- Create comprehensive TESTING_MATCHING_RATINGS.md guide covering all 45 tests
- Document matching runs audit, incremental matching, and scheduler features
- Add code coverage highlights and test scenarios
This commit is contained in:
Radosław Gierwiało
2025-11-30 20:10:25 +01:00
parent bd7212a599
commit 9af4447e1d
2 changed files with 600 additions and 6 deletions

View File

@@ -36,12 +36,16 @@ Web application (PWA) for the dance community enabling matchmaking, chat, and vi
-**Private 1:1 Chat** - private chat for matched users with Socket.IO and message history -**Private 1:1 Chat** - private chat for matched users with Socket.IO and message history
-**Match Slugs** - secure random slugs (CUID) preventing ID enumeration -**Match Slugs** - secure random slugs (CUID) preventing ID enumeration
**Ratings & Reviews:** **Ratings & Stats System:**
-**Partner Ratings** - rate collaboration partners (1-5 stars, comments) -**Partner Ratings** - rate collaboration partners (1-5 stars, comments)
-**Collaboration Preferences** - "would collaborate again" indicator -**Collaboration Preferences** - "would collaborate again" indicator
-**Public Rating Display** - ratings visible on public user profiles -**Public Rating Display** - ratings visible on public user profiles
-**Duplicate Prevention** - users can only rate each match once -**Duplicate Prevention** - users can only rate each match once
-**Auto-completion** - matches auto-complete when both partners have rated -**Auto-completion** - matches auto-complete when both partners have rated
-**Stats Updates** - atomic recording stats updates (recordingsDone, recordingsReceived)
-**Source Filtering** - only auto-matches update fairness stats (manual matches excluded)
-**Race Condition Prevention** - statsApplied flag with atomic check-and-set
-**Idempotency** - double-rating prevention ensures stats update exactly once
**WebRTC P2P File Transfer:** **WebRTC P2P File Transfer:**
-**WebRTC Signaling** - SDP/ICE exchange via Socket.IO -**WebRTC Signaling** - SDP/ICE exchange via Socket.IO
@@ -80,6 +84,10 @@ Web application (PWA) for the dance community enabling matchmaking, chat, and vi
-**Event-specific Upgrades** - accountTierOverride for temporary tier boosts (e.g., Comfort Pass) -**Event-specific Upgrades** - accountTierOverride for temporary tier boosts (e.g., Comfort Pass)
-**Multi-criteria Sorting** - Location > Fairness > Load balancing priority -**Multi-criteria Sorting** - Location > Fairness > Load balancing priority
-**Competitor Numbers** - bib number support for events -**Competitor Numbers** - bib number support for events
-**Matching Runs Audit** - complete audit trail with origin_run_id tracking
-**Incremental Matching** - preserves accepted/completed suggestions across re-runs
-**Scheduler Integration** - automated matching with cron-based scheduling
-**Admin Endpoints** - per-run statistics and suggestion filtering
**Security & PWA (All Implemented):** **Security & PWA (All Implemented):**
-**Security Hardening** - CORS, CSRF, Helmet.js, account lockout, rate limiting -**Security Hardening** - CORS, CSRF, Helmet.js, account lockout, rate limiting
@@ -111,7 +119,7 @@ Web application (PWA) for the dance community enabling matchmaking, chat, and vi
- **bcrypt** - password hashing - **bcrypt** - password hashing
- **JWT** - token-based authentication - **JWT** - token-based authentication
- **AWS SES** - email service - **AWS SES** - email service
- **Jest + Supertest** - testing (286 tests, 73% coverage) - **Jest + Supertest** - testing (342 tests, 72.5% coverage, 100% passing)
### Infrastructure ### Infrastructure
- **Docker Compose** - container orchestration (dev + prod profiles) - **Docker Compose** - container orchestration (dev + prod profiles)
@@ -481,15 +489,38 @@ REPL specifics:
## 📊 Test Coverage ## 📊 Test Coverage
**Backend: 73% overall coverage** (285/286 tests passing - 99.7%) **Backend: 72.5% overall coverage** (342/342 tests passing - 100% ✅)
- **Matching Algorithm**: 30/30 tests passing (unit tests for collision detection, buffers, tier system) - **Matching Algorithm**: 19/19 integration tests passing
- **WebRTC Signaling**: 7/7 tests passing (offer/answer/ICE relay, authorization) - Phase 1: Fundamentals (TC1-3) - basic flow, NOT_FOUND scenarios
- Phase 2: Collision Detection (TC4-9) - buffers, slot mapping, schedule conflicts
- Phase 3: Limits & Workload (TC10-11) - MAX_RECORDINGS, recording-recording collisions
- Phase 4: Fairness & Tiers (TC12-16) - debt calculation, tier penalties
- Phase 5: Edge Cases (TC17-19) - multiple heats, incremental matching
- **Ratings & Stats Flow**: 9/9 E2E tests passing
- Double-rating completion flow
- Atomic stats updates (recordingsDone, recordingsReceived)
- Race condition prevention (statsApplied flag)
- Manual vs auto match differentiation
- Idempotency testing
- **Matching Runs Audit**: 6/6 tests passing
- origin_run_id assignment and tracking
- Sequential runs with separate IDs
- Accepted suggestions preservation
- Filter parameters (onlyAssigned, includeNotFound)
- Manual vs scheduler trigger differentiation
- **Incremental Matching**: 5/5 tests passing
- **Recording Stats Integration**: 6/6 tests passing
- **WebRTC Signaling**: 12/12 tests passing (offer/answer/ICE relay, authorization)
- **Auth Controllers**: Comprehensive coverage - **Auth Controllers**: Comprehensive coverage
- **Events API**: Full test suite - **Events API**: Full test suite
- **Matches API**: Full CRUD tests - **Matches API**: Full CRUD tests
- **Dashboard API**: 12 tests passing - **Dashboard API**: 12 tests passing
- **Socket.IO**: Full WebRTC + chat coverage (1 flaky timeout test) - **Socket.IO**: Full WebRTC + chat coverage (12/12 passing)
- **Test Isolation**: Fixed with unique test data per suite - **Test Isolation**: Fixed with unique test data per suite
- **Code Coverage Highlights**:
- matching.js: 94.71% statements, 91.5% branches
- routes/matches.js: 76.11% statements
- routes/events.js: 78.2% statements
**Frontend: Test files ready** (requires test runner setup) **Frontend: Test files ready** (requires test runner setup)
- WebRTC detection utility tests - WebRTC detection utility tests

View File

@@ -0,0 +1,563 @@
# Testing Guide: Matching & Ratings System
> Comprehensive test coverage for the auto-matching algorithm, ratings & stats system, and matching runs audit functionality.
## Overview
The matching and ratings system has **45 comprehensive integration tests** organized into 5 test suites:
| Test Suite | Tests | Coverage | Status |
|------------|-------|----------|--------|
| matching-algorithm.test.js | 19 | Matching logic, collisions, fairness | ✅ 100% |
| ratings-stats-flow.test.js | 9 | E2E rating workflow | ✅ 100% |
| matching-runs-audit.test.js | 6 | Audit trail, origin_run_id | ✅ 100% |
| matching-incremental.test.js | 5 | Incremental matching behavior | ✅ 100% |
| recording-stats-integration.test.js | 6 | Stats integration | ✅ 100% |
**Total: 45/45 tests passing (100%)**
---
## 1. Matching Algorithm Tests (19 tests)
**File:** `backend/src/__tests__/matching-algorithm.test.js`
**Coverage:** 94.71% statements, 91.5% branches in `matching.js`
These tests verify the complete matching algorithm based on `matching-scenarios.md`.
### Phase 1: Fundamentals (TC1-3)
#### TC1: One dancer, one free recorder → simple happy path
- **Scenario:** Basic 1:1 matching
- **Expected:** Recorder assigned, status PENDING
- **Verifies:** Basic algorithm functionality
#### TC2: No recorders available → NOT_FOUND
- **Scenario:** Dancer with heat, but no other participants
- **Expected:** Suggestion created with status NOT_FOUND
- **Verifies:** Graceful handling of no-recorder situations
#### TC3: Only recorder is self → NOT_FOUND
- **Scenario:** Dancer is the only potential recorder
- **Expected:** NOT_FOUND (can't record yourself)
- **Verifies:** Self-recording prevention
### Phase 2: Collision Detection (TC4-9)
#### TC4: Recorder dancing in same heat → cannot record
- **Scenario:** Both dancer and recorder in heat 10
- **Expected:** NOT_FOUND (collision)
- **Verifies:** Same-heat collision detection
#### TC5: Recorder in buffer BEFORE dance → cannot record
- **Scenario:** Dancer heat 9, Recorder heat 10
- **Config:** `HEAT_BUFFER_BEFORE = 1`
- **Expected:** NOT_FOUND (recorder needs prep time)
- **Verifies:** Before-buffer collision detection
#### TC6: Recorder in buffer AFTER dance → cannot record
- **Scenario:** Dancer heat 11, Recorder heat 10
- **Config:** `HEAT_BUFFER_AFTER = 1`
- **Expected:** NOT_FOUND (recorder needs rest time)
- **Verifies:** After-buffer collision detection
#### TC7: No collision when heat outside buffer
- **Scenario:** Dancer heat 12, Recorder heat 10
- **Buffer:** ±1 means recorder busy in 9,10,11
- **Expected:** SUCCESS (heat 12 is free)
- **Verifies:** Correct buffer calculation
#### TC8: Collision between divisions in same slot
- **Scenario:** scheduleConfig maps Novice and Intermediate to same slot
- **Setup:** Dancer in Novice heat 1, Recorder dancing in Intermediate heat 1
- **Expected:** NOT_FOUND (same time slot collision)
- **Verifies:** scheduleConfig slot mapping
#### TC9: No collision when divisions in different slots
- **Scenario:** Novice in slot 1, Advanced in slot 2
- **Expected:** SUCCESS (different time slots)
- **Verifies:** Multi-slot schedule handling
### Phase 3: Limits & Workload (TC10-11)
#### TC10: MAX_RECORDINGS_PER_PERSON is respected
- **Scenario:** 4 dancers, 1 recorder
- **Config:** `MAX_RECORDINGS_PER_PERSON = 3`
- **Expected:** 3 assigned, 1 NOT_FOUND
- **Verifies:** Per-person recording limit enforcement
#### TC11: Recording-recording collision (critical bug fix)
- **Scenario:** 2 dancers in same heat, 1 recorder
- **Expected:** 1 assigned, 1 NOT_FOUND
- **Verifies:** Recorder can only handle one heat at a time
### Phase 4: Fairness & Tiers (TC12-16)
#### TC12: Higher fairnessDebt → more likely to record
- **Scenario:** RecorderA (debt +10), RecorderB (debt 0)
- **Formula:** `fairnessDebt = recordingsReceived - recordingsDone`
- **Expected:** RecorderA chosen (higher debt = more obligation)
- **Verifies:** Fairness algorithm prioritization
#### TC13: Location score beats fairness
- **Scenario:** RecorderA (same city, debt 0) vs RecorderB (different country, debt 100)
- **Expected:** RecorderA chosen
- **Verifies:** Location > Fairness in priority hierarchy
#### TC14: Basic vs Supporter vs Comfort tier penalties
- **Scenario:** 3 recorders with same location/stats, different tiers
- **Penalties:** BASIC (0), SUPPORTER (-10), COMFORT (-50)
- **Expected:** BASIC chosen
- **Verifies:** Tier penalty system
#### TC15: Supporter chosen when Basic unavailable
- **Scenario:** Basic has collision, Supporter free
- **Expected:** Supporter chosen (penalty overridden by availability)
- **Verifies:** Fallback tier selection
#### TC16: Comfort used as last resort
- **Scenario:** Only Comfort available (Basic has collision)
- **Expected:** Comfort chosen (better than NOT_FOUND)
- **Verifies:** Last-resort matching
### Phase 5: Edge Cases (TC17-19)
#### TC17: Dancer with no heats is ignored
- **Scenario:** Participant with competitorNumber but no EventUserHeat records
- **Expected:** 0 suggestions generated
- **Verifies:** Heat existence requirement
#### TC18: Multiple heats for one dancer - all assigned
- **Scenario:** 1 dancer with 3 heats (different divisions), 2 recorders
- **Expected:** All 3 heats get suggestions
- **Verifies:** Load balancing across multiple recorders (2-1 or 1-2 distribution)
#### TC19: Incremental matching respects accepted suggestions
- **Scenario:**
1. Run 1: 2 heats → 2 PENDING suggestions
2. Recorder accepts suggestion for heat A
3. Run 2: Re-run matching
- **Expected:**
- Heat A: Still has ACCEPTED suggestion (preserved)
- Heat B: Updated/new suggestion
- **Verifies:** `saveMatchingResults` preserves accepted/completed suggestions
---
## 2. Ratings & Stats Flow Tests (9 tests)
**File:** `backend/src/__tests__/ratings-stats-flow.test.js`
**Purpose:** End-to-end workflow from matching to stat updates
### Test Flow
#### STEP 1-3: Setup
- Create event with past registration deadline
- Register 2 users (dancer + recorder)
- Enroll users and declare heat
#### STEP 4: Matching
- Run `runMatching()` → generates suggestions
- Call `saveMatchingResults()` → persists to DB
- **Verifies:** Matching algorithm integration
#### STEP 5: Suggestion Acceptance
- Recorder accepts suggestion via API
- **Verifies:** Auto match creation (`source: 'auto'`)
#### STEP 6a-6b: Double Rating (Critical Flow)
**STEP 6a:** First rating (dancer rates recorder)
```javascript
POST /api/matches/:slug/ratings
{
score: 5,
comment: "Great recorder!",
wouldCollaborateAgain: true
}
```
- **Expected:** Match status → `IN_PROGRESS`
- **Stats:** NOT updated yet (need both ratings)
**STEP 6b:** Second rating (recorder rates dancer)
```javascript
POST /api/matches/:slug/ratings
{
score: 4,
comment: "Good dancer!",
wouldCollaborateAgain: true
}
```
- **Expected:**
- Match status → `COMPLETED`
- `statsApplied``true` (atomic update)
- Stats updated **exactly once**:
- `recorder.recordingsDone += 1`
- `dancer.recordingsReceived += 1`
#### STEP 7: Idempotency Test
- Try rating the same match again
- **Expected:** Stats remain unchanged (double-counting prevention)
#### STEP 8: Manual Match Verification
- Create manual match (`source: 'manual'`)
- Exchange ratings
- **Expected:** Stats NOT updated (only auto-matches affect fairness)
### Key Edge Cases Covered
**Race Condition Prevention**
```javascript
// Atomic check-and-set in backend/src/routes/matches.js:961-995
const updateResult = await prisma.match.updateMany({
where: {
id: match.id,
statsApplied: false // Only update if not already applied
},
data: {
status: MATCH_STATUS.COMPLETED,
statsApplied: true
}
});
// Only winner of the race applies stats
if (updateResult.count === 1) {
await matchingService.applyRecordingStatsForMatch(fullMatch);
}
```
**Source Filtering**
```javascript
// Only auto-matches update stats (backend/src/services/matching.js:679-701)
if (match.source !== 'auto') {
return; // Manual matches don't affect fairness
}
```
**Transaction Safety**
```javascript
// Stats update is transactional
await prisma.$transaction([
prisma.user.update({
where: { id: recorderId },
data: { recordingsDone: { increment: 1 } }
}),
prisma.user.update({
where: { id: dancerId },
data: { recordingsReceived: { increment: 1 } }
})
]);
```
---
## 3. Matching Runs Audit Tests (6 tests)
**File:** `backend/src/__tests__/matching-runs-audit.test.js`
**Purpose:** Verify audit trail and origin_run_id tracking
### TC1: Run assigns origin_run_id correctly
- **Action:** Admin clicks "Run now"
- **Verifies:**
- MatchingRun record created (`trigger: 'manual'`, `status: 'success'`)
- All suggestions get `origin_run_id = runId`
- Stats recorded (`matchedCount`, `notFoundCount`)
- Admin endpoint returns correct data
### TC2: Sequential runs create separate origin_run_ids
- **Scenario:**
1. Run #1 → 1 heat → suggestion S1 (`origin_run_id=1`, `status=PENDING`)
2. Add 2nd dancer with heat
3. Run #2 → 2 heats → suggestions S1', S2 (`origin_run_id=2`)
- **Expected Behavior (IMPORTANT):**
- Run #2 **deletes** PENDING suggestions from Run #1
- `GET /matching-runs/1/suggestions`**0 results** (replaced)
- `GET /matching-runs/2/suggestions`**2 results**
- Both have `origin_run_id = 2`
- **Why:** Incremental matching **intentionally** replaces old PENDING suggestions with fresh ones
### TC3: Accepted/completed suggestions preserve origin_run_id
- **Scenario:**
1. Run #1 → suggestion S1 (`status=PENDING`, `origin_run_id=1`)
2. Recorder accepts → `status=ACCEPTED`
3. Run #2 → re-run matching
- **Expected:**
- S1 **still exists** with `status=ACCEPTED`
- S1 **keeps** `origin_run_id=1` (doesn't change to 2!)
- `GET /matching-runs/1/suggestions` → returns S1
- `GET /matching-runs/2/suggestions` → 0 results (heat already has accepted)
- **Verifies:** Accepted/completed suggestions are preserved across re-runs
### TC4: Filter parameters (onlyAssigned, includeNotFound)
**Setup:** 4 dancers (heats well-spaced), 1 recorder (MAX=3)
**Result:** 3 assigned + 1 NOT_FOUND
**Test 4.1:** `?onlyAssigned=true&includeNotFound=false` (default)
```bash
GET /api/admin/events/:slug/matching-runs/:runId/suggestions?onlyAssigned=true
```
- **Expected:** 3 suggestions (only assigned, NOT_FOUND filtered out)
**Test 4.2:** `?onlyAssigned=false&includeNotFound=true`
```bash
GET /api/admin/events/:slug/matching-runs/:runId/suggestions?includeNotFound=true
```
- **Expected:** 4 suggestions (all, including NOT_FOUND)
**Test 4.3:** Default behavior
```bash
GET /api/admin/events/:slug/matching-runs/:runId/suggestions
```
- **Expected:** 3 suggestions (`onlyAssigned=true` by default)
### TC5: Manual vs scheduler trigger differentiation
- **Verifies:**
- Manual API call → `trigger: 'manual'`
- Scheduler cron → `trigger: 'scheduler'`
- Status lifecycle: `running``success`/`failed`
### TC6: Failed runs are recorded in audit trail
- **Scenario:** Event with 0 heats → matching returns 0 results
- **Expected:**
- MatchingRun created with `status: 'success'`
- `matchedCount=0`, `notFoundCount=0`
- Audit trail complete even for empty runs
---
## 4. Incremental Matching Tests (5 tests)
**File:** `backend/src/__tests__/matching-incremental.test.js`
### Test Scenarios
1. **New suggestions replace PENDING**
- Old PENDING deleted, new created
2. **ACCEPTED suggestions preserved**
- Not deleted or overwritten in re-runs
3. **COMPLETED suggestions preserved**
- No duplicates for finished matches
4. **New heats added after first run**
- Get suggestions in next run
5. **Status workflow**
- PENDING → ACCEPTED → match creation → COMPLETED
---
## 5. Recording Stats Integration Tests (6 tests)
**File:** `backend/src/__tests__/recording-stats-integration.test.js`
### Test Scenarios
1. **Stats update after double rating**
- Atomic update verification
- Both sides checked
2. **Stats don't update for manual matches**
- `source: 'manual'` → no stats change
3. **Stats affect next matching round**
- High debt user gets priority
- Fairness feedback loop works
4. **Multiple matches update stats correctly**
- 3 matches → 3x updates
- No race conditions
5. **Rejected suggestions don't affect stats**
- REJECTED/NOT_FOUND → no impact
6. **Stats persistence**
- Survive restart
- Correctly read in subsequent rounds
---
## Running the Tests
### All Tests
```bash
docker compose exec backend npm test
```
### Specific Test Suites
```bash
# Matching algorithm
docker compose exec backend npm test -- matching-algorithm.test.js
# Ratings & stats flow
docker compose exec backend npm test -- ratings-stats-flow.test.js
# Matching runs audit
docker compose exec backend npm test -- matching-runs-audit.test.js
# Incremental matching
docker compose exec backend npm test -- matching-incremental.test.js
# Stats integration
docker compose exec backend npm test -- recording-stats-integration.test.js
```
### Coverage Report
```bash
docker compose exec backend npm run test:coverage
```
---
## Key Implementation Files
### Core Logic
- **Matching Algorithm:** `backend/src/services/matching.js` (94.71% coverage)
- `runMatching()` - generates suggestions
- `saveMatchingResults()` - persists with origin_run_id
- `applyRecordingStatsForMatch()` - updates fairness stats
### API Endpoints
- **Run Matching:** `POST /api/events/:slug/run-matching`
- Creates MatchingRun audit record
- Triggers matching algorithm
- Updates run stats on completion/failure
- **Rating Endpoint:** `POST /api/matches/:slug/ratings` (`backend/src/routes/matches.js:850-1011`)
- Atomic statsApplied check-and-set
- Match completion on double-rating
- Stats update via `applyRecordingStatsForMatch()`
- **Admin Audit:** `GET /api/admin/events/:slug/matching-runs/:runId/suggestions`
- Filter by origin_run_id
- Support `onlyAssigned` and `includeNotFound` params
- Returns per-run statistics
### Database Schema (Prisma)
```prisma
model RecordingSuggestion {
id Int @id @default(autoincrement())
eventId Int
heatId Int
recorderId Int?
status String // 'pending', 'accepted', 'completed', 'not_found'
originRunId Int? // 🆕 Tracks which run created this suggestion
createdAt DateTime
updatedAt DateTime
}
model MatchingRun {
id Int @id @default(autoincrement())
eventId Int
trigger String // 'manual' or 'scheduler'
status String // 'running', 'success', 'failed'
startedAt DateTime
endedAt DateTime?
matchedCount Int?
notFoundCount Int?
errorMessage String?
}
model Match {
// ... other fields
source String // 'auto' or 'manual'
statsApplied Boolean @default(false) // Race condition prevention
}
model User {
// ... other fields
recordingsDone Int @default(0) // Fairness tracking
recordingsReceived Int @default(0) // Fairness tracking
}
```
---
## Edge Cases Covered
### ✅ Matching Algorithm
- Self-recording prevention
- Same-heat collisions
- Buffer-based collisions (BEFORE/AFTER)
- Schedule slot mapping
- MAX_RECORDINGS enforcement
- Recording-recording collisions
- Fairness debt calculation
- Tier penalty hierarchy
- Location prioritization
- Load balancing
- Multiple heats per dancer
- Incremental matching (preserve accepted)
### ✅ Stats Updates
- Race conditions (atomic check-and-set)
- Double-counting prevention (idempotency)
- Source filtering (auto vs manual)
- Transaction safety
- Match completion workflow
- Double-rating requirements
### ✅ Audit Trail
- origin_run_id assignment
- Sequential run behavior
- PENDING vs ACCEPTED/COMPLETED handling
- Filter parameters
- Trigger differentiation
- Empty run handling
- Failed run recording
---
## Test Data Isolation
All tests use:
- Unique timestamps in emails/usernames
- Cleanup in `afterAll()` hooks
- Proper foreign key deletion order
- Event-scoped data queries
**Example:**
```javascript
const dancer = await prisma.user.create({
data: {
email: `dancer-${Date.now()}@test.com`, // ✅ Unique per test run
username: `dancer_${Date.now()}`,
// ...
}
});
```
---
## Future Test Improvements
Potential additions identified in review:
1. **Concurrent matching runs** - thread-safety testing
2. **Database transaction failures** - rollback behavior
3. **Deleted users/events** - cascade delete handling
4. **Invalid scheduleConfig** - malformed data handling
5. **Extreme data scenarios** - 1000+ dancers, performance testing
6. **Time zone edge cases** - DST transitions, midnight boundaries
7. **Rating validation** - score=0, very long comments, special characters
8. **Match deletion** - compensating transactions for stats
---
## Summary
- **45 comprehensive tests** covering all critical paths
- **100% test pass rate** (342/342 total backend tests)
- **94.71% coverage** on matching.js core logic
- **Real database integration** (not mocked)
- **Production-ready** edge case handling
- **Atomic operations** preventing race conditions
- **Complete audit trail** with origin_run_id tracking
The matching and ratings system is **extensively tested and battle-ready** for production deployment.