A/B the Context, Not the Instruction

Feb 9

A cheap tip for synthetic-data enrichment.

While putting together Song-for-Jane, a DIY lyric writer for my imaginary friend, I kept running into the same issue I’ve seen with financial reports, customer-support tickets, and travel-website requests. When you need to teach a model a very specific behavior, truly relevant examples are scarce. The task has a clear structure, but good samples are rare and hand-picked. If you try to augment directly from that, you don’t grow a tree at all—you keep seeing one hammered together from pickets.

The problem isn’t “more data.” It’s diverse data. Edge cases are the true treasure. The same holds for deep personalisation. If we already had rich, on-target examples, we wouldn’t be here, would we? Burning tokens on LLM generation won’t conjure what isn’t in the corpus. Hiring an expert data writer is costly—and often not an option. Often, when generating synthetics, all you have are loosely related documents.

So I’ve been looking for a way to make generation as cheap as it gets. How do you squeeze everything you can from what you have? If the data itself follows one stable plan (form, requirements, constraints) and the instructions are mostly straightforward, why not branch the context for generation?

To continue reading follow the link:

https://miperlabs.substack.com/p/ab-the-context-not-the-instruction

Inna Vays

A/B the Context, Not the Instruction

A cheap tip for synthetic-data enrichment.

My Very Own Writing Bot: How I Tuned a Soft Prompt on Five Pages of Text

Copilot Gets You Moving. Humans Still Make It Great.

Make it Personal ·· Labs