Methodology
How GameTan measures cognitive abilities, where the norms come from, and what we do not measure.
1. Scientific foundation
GameTan implements a set of cognitive paradigms from peer-reviewed literature. We do not invent metrics. Each paradigm below is an established tool used in cognitive psychology research for decades.
Paradigms used
| Paradigm | Measures | Primary citation |
|---|---|---|
| Simple Reaction Time | Pure motor response latency | Bridges et al. 2020 (human-benchmark.com data, N≈81M) |
| Stroop Task | Interference suppression | Stroop 1935; MacLeod 1991 meta-analysis |
| Eriksen Flanker | Selective attention | Eriksen & Eriksen 1974 |
| Go/No-Go | Response inhibition | Donders 1869; Logan 1994 |
| N-Back (2-back spatial) | Working memory under load | Owen et al. 2005; Jaeggi et al. 2010 |
| Corsi Block | Visuospatial STM span | Corsi 1972; Kessels et al. 2000 meta-analysis |
| Multiple Object Tracking (MOT) | Visual tracking capacity | Pylyshyn & Storm 1988 |
| Task Switching | Cognitive flexibility | Monsell 2003; Kiesel et al. 2010 |
| UFOV (Useful Field of View) | Visual attention breadth | Ball et al. 1988; Edwards et al. 2005 |
| Dual-Task | Concurrent attention allocation | Pashler 1994; Wickens 2002 |
| BART | Risk sensitivity | Lejuez et al. 2002 |
| Mental Rotation | Spatial reasoning | Shepard & Metzler 1971 |
| Perspective Taking | Visuospatial perspective shift | Michelon & Zacks 2006; Samson et al. 2010 |
| Posner Cueing (rebuild in progress) | Attentional orienting | Posner 1980; Posner & Petersen 1990 |
| Iowa Gambling Task (rebuild in progress) | Decision under uncertainty | Bechara et al. 1994 |
| Sensorimotor Synchronization (rebuild in progress) | Rhythm tapping / timing precision | Repp 2005 review |
2. Scoring
Raw scores (reaction time in ms, accuracy %, span length, effect size in ms, threshold duration, etc.) are converted to percentile ranks using the normal CDF (Abramowitz & Stegun approximation) with population-level mean and standard deviation from the cited literature. Percentile values are clamped to the 1–99 range to avoid degenerate output at the tails.
For each dimension, we report:
- Percentile against general population norms
- 95% confidence interval derived from published test-retest reliability values (when available)
- Literature-sourced comparison to published cognitive profiles of gamer subgroups (Dale & Green 2017 for FPS; Kowal et al. 2018 for MOBA; Thompson et al. 2013 for RTS)
3. Normative data sources
We use published norms wherever they exist. We do NOT collect or publish our own professional player norms — no such claim appears anywhere in the product.
Where literature norms are unavailable (pursuit rotor, Posner cueing in gaming context, etc.), we mark the dimension as beta in the report and say so explicitly.
4. Test-retest reliability
Per published literature:
- Simple RT: r ≈ 0.68–0.88
- Stroop effect: r ≈ 0.80+
- Flanker effect: r ≈ 0.80
- Corsi span: r ≈ 0.75–0.85 (Kessels 2000)
- N-Back accuracy: r ≈ 0.75
- Task-switch cost: r ≈ 0.70
- UFOV threshold: r ≈ 0.80+
- BART: r ≈ 0.55–0.75 (lower — behavioural variability)
These values drive the 95% confidence intervals in Deep Assessment reports. Lower reliability = wider interval.
5. Trainability estimates (in Deep report)
Based on meta-analyses of cognitive training:
- Simple RT: ~10–15% ceiling (Draper 2009 meta-analysis)
- Working Memory: ~20–30% (Melby-Lervåg & Hulme 2013)
- Attention / UFOV: ~15–25% (Green & Bavelier 2003; Edwards 2009)
- Task switching / Cognitive flexibility: ~15% (Karbach & Verhaeghen 2014)
6. What we explicitly cannot do
- Diagnose ADHD, autism, dyslexia, or any clinical condition
- Predict whether you can become a professional esports player (that depends on practice hours, coaching, mental health, luck, and opportunity — not just cognition)
- Measure game-specific skill (mechanics, map awareness, champion pool, economy management)
- Measure teamwork (our Perspective Taking is a solo cognitive task)
- Measure mental toughness under tournament pressure
- Provide a cognitive profile of named professional players
7. Known sources of measurement variance
- Device input: Mouse vs trackpad vs touch differ by 20–80 ms in simple RT tasks. Scores are not directly comparable across devices.
- Network latency: For client-side timing (all our games), local measurement avoids network jitter. But display latency varies (60 Hz vs 144 Hz monitor, browser frame rate).
- Practice effects: First-time users often score 5–15% worse than their stable self. Retest after 1–2 sessions for a more reliable profile.
- Fatigue: Deep assessment is ~25 minutes. Later games see fatigue effects.
8. Changelog
- v1.0 (current): Honest positioning, Pattern/Decision/Rhythm flagged as rebuild-in-progress, dimension renames aligned with paradigms, "professional player benchmarks" replaced with literature-sourced population comparisons
Questions, critiques, corrections
If you spot an error in citations, a better normative source, or want to collaborate on validation work, please open an issue at our public research repo.