BLOG_POST / smollm-finetuning-comment-generator

From 4M comments to a style-controlled comment generator

1 min read
171 words
tl;dr summary

I cleaned and deduped 4M scraped comments, bootstrapped style labels with a DeBERTaV3 classifier + pseudo-labeling, then fine-tuned SmolLM3 with LoRA to generate comments in controllable styles.

From 4M comments to a style-controlled comment generator

I ended up with 4M+ comments, each paired with a username and a short description of the content being commented on.

I didn’t want a chatbot. I wanted a generator that can produce believable comments in a requested style.


The whole pipeline

  • Clean + dedup: normalize text, drop very short comments (< 12 chars), exact dedup by hash, near-dedup with MinHash/LSH.
  • Label: hand-label ~1k, fine-tune DeBERTaV3 classifier, then pseudo-label with a human in the loop.
  • Filter hard: keep only high-confidence labels (>= 70%), treat noise as a first-class bucket and exclude it from training.
  • Train: fine-tune SmolLM3 (3B) with LoRA (SFT-only) to condition on the target style.

Styles:

  • happy, toxic, sarcastic, cringe, wholesome (+ noise)

Quick sanity check

I ran a simple arena-style “spot the model” test (n = 500) and only got it right about 57% of the time.

The main lesson: data cleaning + strict filtering beats raw scale when you want controllable style.

hash: e0c
EOF