Blog — Deflated

i built a clean indonesian text corpus because every existing one is broken

CulturaX taught IDK-1 nothing. so i built Cleanesia — 17,761 docs, 19M tokens, open API.

i cancelled 53,000 steps of training because of one line of code

the loss wasn't flat because of learning rate. it was flat because i was training on scrambled tokens.

IDK-1 at 47,500 steps: what the output actually looks like

47.5% done. loss still flat. ran inference for the first time. here's what came out.

IDK-1 week 1: what 8,000 steps of training actually looks like

val loss 10.7 → 7.79. free compute, real numbers, zero drama.

why i went from 500M to 100M parameters (it's not what you think)

DFD-1 wasn't a failure. it was a tradeoff. here's the math.

i trained a crisis PR model and it said 'bitch investigations' once

630K params, character-level tokenizer, val loss 1.03. it works if you squint.

what i learned building an indonesian slm as a non-cs student

i'm a digital PR major. here's what three models taught me that no course would.

i fine-tuned qwen3-4b on indonesian government documents and it actually works

1966 Q&A pairs, 30 minutes on kaggle, and it now cites specific law numbers correctly.