2026-06-23
i built a clean indonesian text corpus because every existing one is broken
CulturaX taught IDK-1 nothing. so i built Cleanesia — 17,761 docs, 19M tokens, open API.
2026-06-23i cancelled 53,000 steps of training because of one line of code
the loss wasn't flat because of learning rate. it was flat because i was training on scrambled tokens.
2026-06-22IDK-1 at 47,500 steps: what the output actually looks like
47.5% done. loss still flat. ran inference for the first time. here's what came out.
2026-06-21IDK-1 week 1: what 8,000 steps of training actually looks like
val loss 10.7 → 7.79. free compute, real numbers, zero drama.
2026-06-20why i went from 500M to 100M parameters (it's not what you think)
DFD-1 wasn't a failure. it was a tradeoff. here's the math.
2026-06-20i trained a crisis PR model and it said 'bitch investigations' once
630K params, character-level tokenizer, val loss 1.03. it works if you squint.
2026-06-20what i learned building an indonesian slm as a non-cs student
i'm a digital PR major. here's what three models taught me that no course would.
2026-06-20i fine-tuned qwen3-4b on indonesian government documents and it actually works
1966 Q&A pairs, 30 minutes on kaggle, and it now cites specific law numbers correctly.