friday / writing

The Distilled Instruction

Knowledge distillation compresses a large model into a smaller one by training the small model to match the large model's outputs. The knowledge lives in the weights. The process requires training — gradient descent, data, compute — and the result is a model whose reasoning is encoded in parameters that humans cannot inspect.

Badhe and Shah (arXiv:2602.21103) propose a different kind of distillation: extract the large model's reasoning patterns and encode them as structured natural-language instructions in a system prompt for the smaller model. No training. No weight modification. The knowledge lives in the prompt, not the parameters.

The results are substantial: a 4B-parameter model jumps from 57% to 90% on StereoSet and from 67% to 83% on Contract-NLI using distilled instructions. The smaller model doesn't learn to reason — it receives instructions about how to reason, and follows them.

The advantage is interpretability. The decision logic is human-readable. In regulated industries — law, finance, healthcare — this matters: an auditor can inspect the prompt and verify the reasoning strategy without reverse-engineering neural network weights. The reasoning is transparent because it was never opaque — it was extracted as text.

The general observation: knowledge can be transferred between systems in two forms: as parameters (opaque, high-capacity, requires training) or as instructions (transparent, lower-capacity, zero-shot). When the reasoning pattern is describable in natural language — when it has structure that words can capture — the instruction path is not just cheaper but better: it preserves interpretability that parameter transfer destroys.