A/B Testing Newsletters: A Practical Guide
From hypothesis to sample size to evaluation: how to run newsletter A/B tests that produce reliable results — and avoid the mistakes that ruin every optimisation round.
Mailaura Team
Mailaura.io
A/B tests are the only reliable way in newsletter marketing to replace gut feel with data. At the same time, they are the most frequently botched process. This guide shows how to set up structured A/B tests that yield reliable decisions and how to avoid the mistakes that spoil every round of optimisation.
What an A/B test actually proves
A clean A/B test measures the effect of one variable on one target metric under equal conditions. If you change subject line AND send time at the same time, you will not know at the end which variable caused the change.
What you can test:
- Subject line
- Preheader
- Sender name
- Send time
- Content (mainly first paragraph, CTA, layout)
- CTA text
- CTA colour / position
- Personalisation (first name yes/no)
- Image vs. no image
What NOT to test in a single test: two of these at once.
Statistical basics: sample size
The most common problem with A/B tests: sample too small. Then the result is statistical noise, not signal.
Rule of thumb for open rate:
- At 25 % baseline and an expected 5 % uplift (to 26.25 %): ≥ 10,000 recipients per variant.
- With an expected 20 % uplift (to 30 %): ≥ 800 per variant.
The smaller the expected difference, the larger the needed sample.
Rule of thumb for click rate (low baseline):
- 3 % CTR → 4 % CTR (33 % relative uplift): ≥ 4,000 per variant.
Mailaura's A/B module shows the minimum sample size for your hypothesis at setup time.
Proper test structure
Variant A (control) vs. variant B (hypothesis)
Never more than 2 variants if your list is under 50,000. With 3+ variants you need 3× the traffic for similar statistical significance.
Randomised split
Sample splitting must be random. Random does not mean "the first 500 in the list vs. the next 500" (systematic bias possible). Mailaura distributes automatically via a hash function.
Hold-out of the remaining list
In a typical split test, 20–30 % of the list runs the test in two variants (10–15 % per variant); the rest (70 %) receives the winning variant later. This way you maximise revenue and still get valid data.
Formulate a clean hypothesis
A good hypothesis is specific, measurable and causal:
- ❌ "We want more clicks."
- ✅ "If we change the CTA from 'Learn more' to 'Start for free', CTR rises by at least 15 %, because the phrasing is more concrete."
Formula: If [change], then [expected outcome], because [reason].
Typical test ideas and expectable effects
Subject line
- Question vs. statement: 5–15 % difference
- With emoji vs. without: ±5 %, highly audience-dependent
- With vs. without personalisation (first name): +2–4 %
- Benefit vs. curiosity: ±10 %
Sender name
- Company vs. person ("Lisa @ Mailaura"): +10–25 %. By far the biggest lever rarely tested.
CTA
- Text change: ±10–30 %
- Button colour (if contrast rises): ±5–15 %
- Position: ±10 %
Send time
- Morning vs. afternoon: up to 20 %.
- Weekday: see The best time to send a newsletter
Test duration and decision
- Open rate stabilises for most campaigns after 4–8 hours.
- Click rate after 12–24 hours.
- Conversion rate can take 3–7 days (depends on buy cycle).
Mailaura calculates "test maturity" live. Results are valid once p < 0.05 (95 % confidence) is reached.
Common mistakes
1. Aborting the test when the desired trend appears
After 2 hours it is 27 % vs. 25 %? That is not a signal. Always wait for statistical significance.
2. Multiple tests in parallel on the same list
If you test subject lines on Tuesday and CTAs on Thursday, the testing effect overlaps with natural fluctuation.
3. Not measuring conversion
More clicks are worthless if the clicks convert worse. Always measure through to the business goal.
4. No learning log
After 10 tests you should have a clear picture of what works for your list. Without notes, the knowledge evaporates across team members.
Simple format — A/B test log (spreadsheet):
| Date | Variable | A | B | Winner | Uplift | Significance |
|---|---|---|---|---|---|---|
| 12 Mar 26 | Subject | question | statement | A | +12 % | p=0.02 |
After 20 entries you spot patterns that help with every new campaign.
5. Tests without hypothesis
"Let's try something" produces results you cannot learn from. Always if-then-because.
Cross-campaign tests
Some things are not tested in a single campaign but over several:
- Frequency: 1× vs. 2× per week across two separate segments over 8 weeks.
- Subscription type: "daily digest" vs. "weekly" for new subscribers.
- Design system: complete redesign migration for 50 % of the list, old design for the other 50 %.
These long-term tests need 30–60 days but reveal effects that single-campaign tests never show.
Tool check
Good A/B test tools offer:
- Sample-size calculator before setup
- Automatic significance display
- Hold-out send of winner
- Searchable test history
- Multi-variant support (at least A/B/n with n=3)
Mailaura offers all points incl. conversion tracking when events arrive from your own shop/API.
Conclusion
A/B tests are not a gimmick, they are the engine of every email optimisation. With a clean hypothesis, sufficient sample size and a documented learning log, you reach open and click rates within a quarter that would otherwise take years. Start with the highest-leverage lever — the sender name — and work through the list. All important metrics are covered in parallel in Newsletter KPIs.
Also available in:
Ready for your next newsletter?
Mailaura makes newsletter marketing easy, GDPR-compliant and AI-powered. Start for free.
Start for free