A/B tests are the only reliable way in newsletter marketing to replace gut feel with data. At the same time, they are the most frequently botched process. This guide shows how to set up structured A/B tests that yield reliable decisions and how to avoid the mistakes that spoil every round of optimisation.

What an A/B test actually proves

A clean A/B test measures the effect of one variable on one target metric under equal conditions. If you change subject line AND send time at the same time, you will not know at the end which variable caused the change.

What you can test:

Subject line
Preheader
Sender name
Send time
Content (mainly first paragraph, CTA, layout)
CTA text
CTA colour / position
Personalisation (first name yes/no)
Image vs. no image

What NOT to test in a single test: two of these at once.

Statistical basics: sample size

The most common problem with A/B tests: sample too small. Then the result is statistical noise, not signal.

Rule of thumb for open rate:

At 25 % baseline and an expected 5 % uplift (to 26.25 %): ≥ 10,000 recipients per variant.
With an expected 20 % uplift (to 30 %): ≥ 800 per variant.

The smaller the expected difference, the larger the needed sample.

Rule of thumb for click rate (low baseline):

3 % CTR → 4 % CTR (33 % relative uplift): ≥ 4,000 per variant.

Mailaura's A/B module shows the minimum sample size for your hypothesis at setup time.

Proper test structure

Variant A (control) vs. variant B (hypothesis)

Never more than 2 variants if your list is under 50,000. With 3+ variants you need 3× the traffic for similar statistical significance.

Randomised split

Sample splitting must be random. Random does not mean "the first 500 in the list vs. the next 500" (systematic bias possible). Mailaura distributes automatically via a hash function.

Hold-out of the remaining list

In a typical split test, 20–30 % of the list runs the test in two variants (10–15 % per variant); the rest (70 %) receives the winning variant later. This way you maximise revenue and still get valid data.

Formulate a clean hypothesis

A good hypothesis is specific, measurable and causal:

❌ "We want more clicks."
✅ "If we change the CTA from 'Learn more' to 'Start for free', CTR rises by at least 15 %, because the phrasing is more concrete."

Formula: If [change], then [expected outcome], because [reason].

Typical test ideas and expectable effects

Subject line

Question vs. statement: 5–15 % difference
With emoji vs. without: ±5 %, highly audience-dependent
With vs. without personalisation (first name): +2–4 %
Benefit vs. curiosity: ±10 %

Sender name

Company vs. person ("Lisa @ Mailaura"): +10–25 %. By far the biggest lever rarely tested.

CTA

Text change: ±10–30 %
Button colour (if contrast rises): ±5–15 %
Position: ±10 %

Send time

Morning vs. afternoon: up to 20 %.
Weekday: see The best time to send a newsletter

Test duration and decision

Open rate stabilises for most campaigns after 4–8 hours.
Click rate after 12–24 hours.
Conversion rate can take 3–7 days (depends on buy cycle).

Mailaura calculates "test maturity" live. Results are valid once p < 0.05 (95 % confidence) is reached.

Common mistakes

1. Aborting the test when the desired trend appears

After 2 hours it is 27 % vs. 25 %? That is not a signal. Always wait for statistical significance.

2. Multiple tests in parallel on the same list

If you test subject lines on Tuesday and CTAs on Thursday, the testing effect overlaps with natural fluctuation.

3. Not measuring conversion

More clicks are worthless if the clicks convert worse. Always measure through to the business goal.

4. No learning log

After 10 tests you should have a clear picture of what works for your list. Without notes, the knowledge evaporates across team members.

Simple format — A/B test log (spreadsheet):

Date	Variable	A	B	Winner	Uplift	Significance
12 Mar 26	Subject	question	statement	A	+12 %	p=0.02

After 20 entries you spot patterns that help with every new campaign.

5. Tests without hypothesis

"Let's try something" produces results you cannot learn from. Always if-then-because.

Cross-campaign tests

Some things are not tested in a single campaign but over several:

Frequency: 1× vs. 2× per week across two separate segments over 8 weeks.
Subscription type: "daily digest" vs. "weekly" for new subscribers.
Design system: complete redesign migration for 50 % of the list, old design for the other 50 %.

These long-term tests need 30–60 days but reveal effects that single-campaign tests never show.

Tool check

Good A/B test tools offer:

Sample-size calculator before setup
Automatic significance display
Hold-out send of winner
Searchable test history
Multi-variant support (at least A/B/n with n=3)

Mailaura offers all points incl. conversion tracking when events arrive from your own shop/API.

Conclusion

A/B tests are not a gimmick, they are the engine of every email optimisation. With a clean hypothesis, sufficient sample size and a documented learning log, you reach open and click rates within a quarter that would otherwise take years. Start with the highest-leverage lever — the sender name — and work through the list. All important metrics are covered in parallel in Newsletter KPIs.

A/B Testing Newsletters: A Practical Guide