Results Analysis

Interpreting statistics and making decisions based on experiment data.

What Analysis Shows

After running analysis, the system calculates results for each metric.

For A/B tests (2 groups):

Effect (Δ) — absolute and relative change between groups
Confidence Interval (CI Low, CI Up) — range of possible effect values
P-value — statistical significance of difference
Processing — applied data transformation (for example, log)

For A/B/C+ tests (3+ groups):

Global p-value — overall significance of differences between all groups for each metric
Pairwise comparisons — detailed comparison of each pair of groups (0-1, 0-2, 1-2) with p-values and confidence intervals

Results are displayed in tables divided by metric types: Conversion, Numeric, Ratio.

Analysis Methods

The system automatically selects optimal statistical methods for each experiment type and metric type.

Selection principle: For each combination (experiment type + metric type), a hierarchy of methods is used — from simple (for example, z-test for proportions) to more complex (for example, Welch ANOVA + Games-Howell). The system chooses the most appropriate method based on data characteristics: sample size, distribution, variances.

Experiment types:

A/B test (2 groups) — classical methods for comparing two samples
A/B/C+ test (3+ groups) — multiple comparison methods with correction for multiple testing

Metric types:

Conversion — binary metrics use methods for proportions
Numeric — numerical metrics use methods for continuous quantities
Ratio — composite metrics use Delta Method or linearization to correctly account for numerator and denominator variability

Resampling for Small Samples

For experiments with small sample size (< 20,000 users), the system may use Monte Carlo resampling. This method creates thousands of simulated samples from the original data for more reliable estimation of statistical indicators.

Advantages:

More accurate confidence intervals on small samples
Does not require normality assumptions
Accounts for real data structure

Resampling can be enabled or disabled in analysis settings. By default, it activates automatically when sample is less than 20,000 users.

Analysis Settings

Before running analysis, you can configure:

Significance level (α) — threshold for p-value (default 0.05)
Resampling — enable/disable for small samples
Show confidence intervals — display CI in results
FDR correction — apply Benjamini-Hochberg correction to key metrics to protect against "random" significant results when analyzing a family of metrics (see more in FDR correction)

How to Read Results

P-value and Statistical Significance

P < α (highlighted in green): Result is statistically significant — the difference between groups is most likely not random. You can trust conclusions.

P ≥ α (gray): Result is not significant — there is insufficient evidence that the change had an effect. Perhaps a larger sample is needed, or the effect is absent.

Important: P-value shows result reliability but does not indicate effect size or importance. Significance level α is configured before analysis.

Confidence Interval (CI)

A range of values within which the true effect falls with given probability (for example, 95% with α = 0.05).

Interval does not include 0: Effect is statistically significant. For example, CI = [+2.1%, +5.3%] — all values are positive.

Interval includes 0: Effect is not significant. For example, CI = [-1.2%, +3.5%] — both positive and negative values are possible.

Interval width:

Narrow interval = high estimation accuracy
Wide interval = low accuracy, possibly insufficient sample

Relative Change

Percentage change relative to control group.

Example: Control = 5%, Test = 5.5%, Change = +10% (not +0.5%!)

Always look at relative change together with absolute and confidence interval for the full picture.

Creating Ratio Metrics

If the file has two numeric metrics (for example, revenue and orders), you can create a ratio metric directly in the analysis interface.

Process:

Go to Ratio tab
Click "Add ratio metric"
Select numerator (for example, revenue)
Select denominator (for example, orders)
Specify name (for example, average_order_value)

The system will calculate the metric with correct statistics accounting for variability of both components.

Examples of ratio metrics:

revenue / orders — average order value
clicks / impressions — CTR
revenue / sessions — revenue per session

Decision Making

Success:

Effect is positive
Statistically significant (p-value less than chosen α level)
Sufficient from business perspective

Failure:

Effect is absent or negative
Or insufficient to justify resources

Ambiguous:

Effect is close to significance boundary
Contradictory results across different metrics
Additional research needed

Always consider not only statistics but also business context, implementation cost, and risks.

Results Analysis ​

What Analysis Shows ​

Analysis Methods ​

Resampling for Small Samples ​

Analysis Settings ​

How to Read Results ​

P-value and Statistical Significance ​

Confidence Interval (CI) ​

Relative Change ​

Creating Ratio Metrics ​

Decision Making ​