A/B the Context, Not the Instruction

A cheap tip for synthetic-data enrichment.

While putting together Song-for-Jane, a DIY lyric writer for my imaginary friend, I kept running into the same issue I’ve seen with financial reports, customer-support tickets, and travel-website requests. When you need to teach a model a very specific behavior, truly relevant examples are scarce. The task has a clear structure, but good samples are rare and hand-picked. If you try to augment directly from that, you don’t grow a tree at all—you keep seeing one hammered together from pickets.

The problem isn’t “more data.” It’s diverse data. Edge cases are the true treasure. The same holds for deep personalisation. If we already had rich, on-target examples, we wouldn’t be here, would we? Burning tokens on LLM generation won’t conjure what isn’t in the corpus. Hiring an expert data writer is costly—and often not an option. Often, when generating synthetics, all you have are loosely related documents.

So I’ve been looking for a way to make generation as cheap as it gets. How do you squeeze everything you can from what you have? If the data itself follows one stable plan (form, requirements, constraints) and the instructions are mostly straightforward, why not branch the context for generation?

To continue reading follow the link:

https://miperlabs.substack.com/p/ab-the-context-not-the-instruction

Previous
Previous

My Very Own Writing Bot: How I Tuned a Soft Prompt on Five Pages of Text

Next
Next

Copilot Gets You Moving. Humans Still Make It Great.