v3 issues fixed: visual — SVG anchors now positioned via DOM measurement of each rendered kana (no more padX offset). Timing — per-mora boundaries from torchaudio MMS_FA forced alignment over pyopenjtalk phoneme transcripts, not even-division. ~20-50ms accuracy on clean TTS clips.