Files

Radosław Gierwiało 9af4447e1d docs: update documentation with matching runs audit and complete test coverage

- Update README.md with current test statistics (342/342 tests passing)
- Add detailed breakdown of all matching/ratings test suites
- Create comprehensive TESTING_MATCHING_RATINGS.md guide covering all 45 tests
- Document matching runs audit, incremental matching, and scheduler features
- Add code coverage highlights and test scenarios

2025-11-30 20:10:25 +01:00

17 KiB

Raw Blame History

Testing Guide: Matching & Ratings System

Comprehensive test coverage for the auto-matching algorithm, ratings & stats system, and matching runs audit functionality.

Overview

The matching and ratings system has 45 comprehensive integration tests organized into 5 test suites:

Test Suite	Tests	Coverage	Status
matching-algorithm.test.js	19	Matching logic, collisions, fairness	✅ 100%
ratings-stats-flow.test.js	9	E2E rating workflow	✅ 100%
matching-runs-audit.test.js	6	Audit trail, origin_run_id	✅ 100%
matching-incremental.test.js	5	Incremental matching behavior	✅ 100%
recording-stats-integration.test.js	6	Stats integration	✅ 100%

Total: 45/45 tests passing (100%)

1. Matching Algorithm Tests (19 tests)

File: backend/src/__tests__/matching-algorithm.test.js Coverage: 94.71% statements, 91.5% branches in matching.js

These tests verify the complete matching algorithm based on matching-scenarios.md.

Phase 1: Fundamentals (TC1-3)

TC1: One dancer, one free recorder → simple happy path

Scenario: Basic 1:1 matching
Expected: Recorder assigned, status PENDING
Verifies: Basic algorithm functionality

TC2: No recorders available → NOT_FOUND

Scenario: Dancer with heat, but no other participants
Expected: Suggestion created with status NOT_FOUND
Verifies: Graceful handling of no-recorder situations

TC3: Only recorder is self → NOT_FOUND

Scenario: Dancer is the only potential recorder
Expected: NOT_FOUND (can't record yourself)
Verifies: Self-recording prevention

Phase 2: Collision Detection (TC4-9)

TC4: Recorder dancing in same heat → cannot record

Scenario: Both dancer and recorder in heat 10
Expected: NOT_FOUND (collision)
Verifies: Same-heat collision detection

TC5: Recorder in buffer BEFORE dance → cannot record

Scenario: Dancer heat 9, Recorder heat 10
Config: HEAT_BUFFER_BEFORE = 1
Expected: NOT_FOUND (recorder needs prep time)
Verifies: Before-buffer collision detection

TC6: Recorder in buffer AFTER dance → cannot record

Scenario: Dancer heat 11, Recorder heat 10
Config: HEAT_BUFFER_AFTER = 1
Expected: NOT_FOUND (recorder needs rest time)
Verifies: After-buffer collision detection

TC7: No collision when heat outside buffer

Scenario: Dancer heat 12, Recorder heat 10
Buffer: ±1 means recorder busy in 9,10,11
Expected: SUCCESS (heat 12 is free)
Verifies: Correct buffer calculation

TC8: Collision between divisions in same slot

Scenario: scheduleConfig maps Novice and Intermediate to same slot
Setup: Dancer in Novice heat 1, Recorder dancing in Intermediate heat 1
Expected: NOT_FOUND (same time slot collision)
Verifies: scheduleConfig slot mapping

TC9: No collision when divisions in different slots

Scenario: Novice in slot 1, Advanced in slot 2
Expected: SUCCESS (different time slots)
Verifies: Multi-slot schedule handling

Phase 3: Limits & Workload (TC10-11)

TC10: MAX_RECORDINGS_PER_PERSON is respected

Scenario: 4 dancers, 1 recorder
Config: MAX_RECORDINGS_PER_PERSON = 3
Expected: 3 assigned, 1 NOT_FOUND
Verifies: Per-person recording limit enforcement

TC11: Recording-recording collision (critical bug fix)

Scenario: 2 dancers in same heat, 1 recorder
Expected: 1 assigned, 1 NOT_FOUND
Verifies: Recorder can only handle one heat at a time

Phase 4: Fairness & Tiers (TC12-16)

TC12: Higher fairnessDebt → more likely to record

Scenario: RecorderA (debt +10), RecorderB (debt 0)
Formula: fairnessDebt = recordingsReceived - recordingsDone
Expected: RecorderA chosen (higher debt = more obligation)
Verifies: Fairness algorithm prioritization

TC13: Location score beats fairness

Scenario: RecorderA (same city, debt 0) vs RecorderB (different country, debt 100)
Expected: RecorderA chosen
Verifies: Location > Fairness in priority hierarchy

TC14: Basic vs Supporter vs Comfort tier penalties

Scenario: 3 recorders with same location/stats, different tiers
Penalties: BASIC (0), SUPPORTER (-10), COMFORT (-50)
Expected: BASIC chosen
Verifies: Tier penalty system

TC15: Supporter chosen when Basic unavailable

Scenario: Basic has collision, Supporter free
Expected: Supporter chosen (penalty overridden by availability)
Verifies: Fallback tier selection

TC16: Comfort used as last resort

Scenario: Only Comfort available (Basic has collision)
Expected: Comfort chosen (better than NOT_FOUND)
Verifies: Last-resort matching

Phase 5: Edge Cases (TC17-19)

TC17: Dancer with no heats is ignored

Scenario: Participant with competitorNumber but no EventUserHeat records
Expected: 0 suggestions generated
Verifies: Heat existence requirement

TC18: Multiple heats for one dancer - all assigned

Scenario: 1 dancer with 3 heats (different divisions), 2 recorders
Expected: All 3 heats get suggestions
Verifies: Load balancing across multiple recorders (2-1 or 1-2 distribution)

TC19: Incremental matching respects accepted suggestions

Scenario:
1. Run 1: 2 heats → 2 PENDING suggestions
2. Recorder accepts suggestion for heat A
3. Run 2: Re-run matching
Expected:
- Heat A: Still has ACCEPTED suggestion (preserved)
- Heat B: Updated/new suggestion
Verifies: saveMatchingResults preserves accepted/completed suggestions

2. Ratings & Stats Flow Tests (9 tests)

File: backend/src/__tests__/ratings-stats-flow.test.js Purpose: End-to-end workflow from matching to stat updates

Test Flow

STEP 1-3: Setup

Create event with past registration deadline
Register 2 users (dancer + recorder)
Enroll users and declare heat

STEP 4: Matching

Run runMatching() → generates suggestions
Call saveMatchingResults() → persists to DB
Verifies: Matching algorithm integration

STEP 5: Suggestion Acceptance

Recorder accepts suggestion via API
Verifies: Auto match creation (source: 'auto')

STEP 6a-6b: Double Rating (Critical Flow)

STEP 6a: First rating (dancer rates recorder)

POST /api/matches/:slug/ratings
{
  score: 5,
  comment: "Great recorder!",
  wouldCollaborateAgain: true
}

Expected: Match status → IN_PROGRESS
Stats: NOT updated yet (need both ratings)

STEP 6b: Second rating (recorder rates dancer)

POST /api/matches/:slug/ratings
{
  score: 4,
  comment: "Good dancer!",
  wouldCollaborateAgain: true
}

Expected:
- Match status → COMPLETED
- statsApplied → true (atomic update)
- Stats updated exactly once:
  - recorder.recordingsDone += 1
  - dancer.recordingsReceived += 1

STEP 7: Idempotency Test

Try rating the same match again
Expected: Stats remain unchanged (double-counting prevention)

STEP 8: Manual Match Verification

Create manual match (source: 'manual')
Exchange ratings
Expected: Stats NOT updated (only auto-matches affect fairness)

Key Edge Cases Covered

✅ Race Condition Prevention

// Atomic check-and-set in backend/src/routes/matches.js:961-995
const updateResult = await prisma.match.updateMany({
  where: {
    id: match.id,
    statsApplied: false  // Only update if not already applied
  },
  data: {
    status: MATCH_STATUS.COMPLETED,
    statsApplied: true
  }
});

// Only winner of the race applies stats
if (updateResult.count === 1) {
  await matchingService.applyRecordingStatsForMatch(fullMatch);
}

✅ Source Filtering

// Only auto-matches update stats (backend/src/services/matching.js:679-701)
if (match.source !== 'auto') {
  return; // Manual matches don't affect fairness
}

✅ Transaction Safety

// Stats update is transactional
await prisma.$transaction([
  prisma.user.update({
    where: { id: recorderId },
    data: { recordingsDone: { increment: 1 } }
  }),
  prisma.user.update({
    where: { id: dancerId },
    data: { recordingsReceived: { increment: 1 } }
  })
]);

3. Matching Runs Audit Tests (6 tests)

File: backend/src/__tests__/matching-runs-audit.test.js Purpose: Verify audit trail and origin_run_id tracking

TC1: Run assigns origin_run_id correctly

Action: Admin clicks "Run now"
Verifies:
- MatchingRun record created (trigger: 'manual', status: 'success')
- All suggestions get origin_run_id = runId
- Stats recorded (matchedCount, notFoundCount)
- Admin endpoint returns correct data

TC2: Sequential runs create separate origin_run_ids

Scenario:
1. Run #1 → 1 heat → suggestion S1 (origin_run_id=1, status=PENDING)
2. Add 2nd dancer with heat
3. Run #2 → 2 heats → suggestions S1', S2 (origin_run_id=2)
Expected Behavior (IMPORTANT):
- Run #2 deletes PENDING suggestions from Run #1
- GET /matching-runs/1/suggestions → 0 results (replaced)
- GET /matching-runs/2/suggestions → 2 results
- Both have origin_run_id = 2
Why: Incremental matching intentionally replaces old PENDING suggestions with fresh ones

TC3: Accepted/completed suggestions preserve origin_run_id

Scenario:
1. Run #1 → suggestion S1 (status=PENDING, origin_run_id=1)
2. Recorder accepts → status=ACCEPTED
3. Run #2 → re-run matching
Expected:
- S1 still exists with status=ACCEPTED
- S1 keeps origin_run_id=1 (doesn't change to 2!)
- GET /matching-runs/1/suggestions → returns S1
- GET /matching-runs/2/suggestions → 0 results (heat already has accepted)
Verifies: Accepted/completed suggestions are preserved across re-runs

TC4: Filter parameters (onlyAssigned, includeNotFound)

Setup: 4 dancers (heats well-spaced), 1 recorder (MAX=3) Result: 3 assigned + 1 NOT_FOUND

Test 4.1: ?onlyAssigned=true&includeNotFound=false (default)

GET /api/admin/events/:slug/matching-runs/:runId/suggestions?onlyAssigned=true

Expected: 3 suggestions (only assigned, NOT_FOUND filtered out)

Test 4.2: ?onlyAssigned=false&includeNotFound=true

GET /api/admin/events/:slug/matching-runs/:runId/suggestions?includeNotFound=true

Expected: 4 suggestions (all, including NOT_FOUND)

Test 4.3: Default behavior

GET /api/admin/events/:slug/matching-runs/:runId/suggestions

Expected: 3 suggestions (onlyAssigned=true by default)

TC5: Manual vs scheduler trigger differentiation

Verifies:
- Manual API call → trigger: 'manual'
- Scheduler cron → trigger: 'scheduler'
- Status lifecycle: running → success/failed

TC6: Failed runs are recorded in audit trail

Scenario: Event with 0 heats → matching returns 0 results
Expected:
- MatchingRun created with status: 'success'
- matchedCount=0, notFoundCount=0
- Audit trail complete even for empty runs

4. Incremental Matching Tests (5 tests)

File: backend/src/__tests__/matching-incremental.test.js

Test Scenarios

New suggestions replace PENDING
- Old PENDING deleted, new created
ACCEPTED suggestions preserved
- Not deleted or overwritten in re-runs
COMPLETED suggestions preserved
- No duplicates for finished matches
New heats added after first run
- Get suggestions in next run
Status workflow
- PENDING → ACCEPTED → match creation → COMPLETED

5. Recording Stats Integration Tests (6 tests)

File: backend/src/__tests__/recording-stats-integration.test.js

Test Scenarios

Stats update after double rating
- Atomic update verification
- Both sides checked
Stats don't update for manual matches
- source: 'manual' → no stats change
Stats affect next matching round
- High debt user gets priority
- Fairness feedback loop works
Multiple matches update stats correctly
- 3 matches → 3x updates
- No race conditions
Rejected suggestions don't affect stats
- REJECTED/NOT_FOUND → no impact
Stats persistence
- Survive restart
- Correctly read in subsequent rounds

Running the Tests

All Tests

docker compose exec backend npm test

Specific Test Suites

# Matching algorithm
docker compose exec backend npm test -- matching-algorithm.test.js

# Ratings & stats flow
docker compose exec backend npm test -- ratings-stats-flow.test.js

# Matching runs audit
docker compose exec backend npm test -- matching-runs-audit.test.js

# Incremental matching
docker compose exec backend npm test -- matching-incremental.test.js

# Stats integration
docker compose exec backend npm test -- recording-stats-integration.test.js

Coverage Report

docker compose exec backend npm run test:coverage

Key Implementation Files

Core Logic

Matching Algorithm: backend/src/services/matching.js (94.71% coverage)
- runMatching() - generates suggestions
- saveMatchingResults() - persists with origin_run_id
- applyRecordingStatsForMatch() - updates fairness stats

API Endpoints

Run Matching: POST /api/events/:slug/run-matching
- Creates MatchingRun audit record
- Triggers matching algorithm
- Updates run stats on completion/failure
Rating Endpoint: POST /api/matches/:slug/ratings (backend/src/routes/matches.js:850-1011)
- Atomic statsApplied check-and-set
- Match completion on double-rating
- Stats update via applyRecordingStatsForMatch()
Admin Audit: GET /api/admin/events/:slug/matching-runs/:runId/suggestions
- Filter by origin_run_id
- Support onlyAssigned and includeNotFound params
- Returns per-run statistics

Database Schema (Prisma)

model RecordingSuggestion {
  id           Int     @id @default(autoincrement())
  eventId      Int
  heatId       Int
  recorderId   Int?
  status       String  // 'pending', 'accepted', 'completed', 'not_found'
  originRunId  Int?    // 🆕 Tracks which run created this suggestion
  createdAt    DateTime
  updatedAt    DateTime
}

model MatchingRun {
  id            Int       @id @default(autoincrement())
  eventId       Int
  trigger       String    // 'manual' or 'scheduler'
  status        String    // 'running', 'success', 'failed'
  startedAt     DateTime
  endedAt       DateTime?
  matchedCount  Int?
  notFoundCount Int?
  errorMessage  String?
}

model Match {
  // ... other fields
  source        String    // 'auto' or 'manual'
  statsApplied  Boolean @default(false)  // Race condition prevention
}

model User {
  // ... other fields
  recordingsDone     Int @default(0)      // Fairness tracking
  recordingsReceived Int @default(0)      // Fairness tracking
}

Edge Cases Covered

✅ Matching Algorithm

Self-recording prevention
Same-heat collisions
Buffer-based collisions (BEFORE/AFTER)
Schedule slot mapping
MAX_RECORDINGS enforcement
Recording-recording collisions
Fairness debt calculation
Tier penalty hierarchy
Location prioritization
Load balancing
Multiple heats per dancer
Incremental matching (preserve accepted)

✅ Stats Updates

Race conditions (atomic check-and-set)
Double-counting prevention (idempotency)
Source filtering (auto vs manual)
Transaction safety
Match completion workflow
Double-rating requirements

✅ Audit Trail

origin_run_id assignment
Sequential run behavior
PENDING vs ACCEPTED/COMPLETED handling
Filter parameters
Trigger differentiation
Empty run handling
Failed run recording

Test Data Isolation

All tests use:

Unique timestamps in emails/usernames
Cleanup in afterAll() hooks
Proper foreign key deletion order
Event-scoped data queries

Example:

const dancer = await prisma.user.create({
  data: {
    email: `dancer-${Date.now()}@test.com`,  // ✅ Unique per test run
    username: `dancer_${Date.now()}`,
    // ...
  }
});

Future Test Improvements

Potential additions identified in review:

Concurrent matching runs - thread-safety testing
Database transaction failures - rollback behavior
Deleted users/events - cascade delete handling
Invalid scheduleConfig - malformed data handling
Extreme data scenarios - 1000+ dancers, performance testing
Time zone edge cases - DST transitions, midnight boundaries
Rating validation - score=0, very long comments, special characters
Match deletion - compensating transactions for stats

Summary

45 comprehensive tests covering all critical paths
100% test pass rate (342/342 total backend tests)
94.71% coverage on matching.js core logic
Real database integration (not mocked)
Production-ready edge case handling
Atomic operations preventing race conditions
Complete audit trail with origin_run_id tracking

The matching and ratings system is extensively tested and battle-ready for production deployment.

17 KiB Raw Blame History

Testing Guide: Matching & Ratings System

Overview

1. Matching Algorithm Tests (19 tests)

Phase 1: Fundamentals (TC1-3)

TC1: One dancer, one free recorder → simple happy path

TC2: No recorders available → NOT_FOUND

TC3: Only recorder is self → NOT_FOUND

Phase 2: Collision Detection (TC4-9)

TC4: Recorder dancing in same heat → cannot record

TC5: Recorder in buffer BEFORE dance → cannot record

TC6: Recorder in buffer AFTER dance → cannot record

TC7: No collision when heat outside buffer

TC8: Collision between divisions in same slot

TC9: No collision when divisions in different slots

Phase 3: Limits & Workload (TC10-11)

TC10: MAX_RECORDINGS_PER_PERSON is respected

TC11: Recording-recording collision (critical bug fix)

Phase 4: Fairness & Tiers (TC12-16)

TC12: Higher fairnessDebt → more likely to record

TC13: Location score beats fairness

TC14: Basic vs Supporter vs Comfort tier penalties

TC15: Supporter chosen when Basic unavailable

TC16: Comfort used as last resort

Phase 5: Edge Cases (TC17-19)

TC17: Dancer with no heats is ignored

TC18: Multiple heats for one dancer - all assigned

TC19: Incremental matching respects accepted suggestions

2. Ratings & Stats Flow Tests (9 tests)

Test Flow

STEP 1-3: Setup

STEP 4: Matching

STEP 5: Suggestion Acceptance

STEP 6a-6b: Double Rating (Critical Flow)

STEP 7: Idempotency Test

STEP 8: Manual Match Verification

Key Edge Cases Covered

3. Matching Runs Audit Tests (6 tests)

TC1: Run assigns origin_run_id correctly

TC2: Sequential runs create separate origin_run_ids

TC3: Accepted/completed suggestions preserve origin_run_id

TC4: Filter parameters (onlyAssigned, includeNotFound)

TC5: Manual vs scheduler trigger differentiation

TC6: Failed runs are recorded in audit trail

4. Incremental Matching Tests (5 tests)

Test Scenarios

5. Recording Stats Integration Tests (6 tests)

Test Scenarios

Running the Tests

All Tests

Specific Test Suites

Coverage Report

Key Implementation Files

Core Logic

API Endpoints

Database Schema (Prisma)

Edge Cases Covered

✅ Matching Algorithm

✅ Stats Updates

✅ Audit Trail

Test Data Isolation

Future Test Improvements

Summary

17 KiB

Raw Blame History