Results Analysis
Interpreting statistics and making decisions based on experiment data.
What Analysis Shows
After running analysis, the system calculates results for each metric.
For A/B tests (2 groups):
- Effect (Δ) — absolute and relative change between groups
- Confidence Interval (CI Low, CI Up) — range of possible effect values
- P-value — statistical significance of difference
- Processing — applied data transformation (for example, log)
For A/B/C+ tests (3+ groups):
- Global p-value — overall significance of differences between all groups for each metric
- Pairwise comparisons — detailed comparison of each pair of groups (0-1, 0-2, 1-2) with p-values and confidence intervals
Results are displayed in tables divided by metric types: Conversion, Numeric, Ratio.
Analysis Methods
The system automatically selects optimal statistical methods for each experiment type and metric type.
Selection principle: For each combination (experiment type + metric type), a hierarchy of methods is used — from simple (for example, z-test for proportions) to more complex (for example, Welch ANOVA + Games-Howell). The system chooses the most appropriate method based on data characteristics: sample size, distribution, variances.
Experiment types:
- A/B test (2 groups) — classical methods for comparing two samples
- A/B/C+ test (3+ groups) — multiple comparison methods with correction for multiple testing
Metric types:
- Conversion — binary metrics use methods for proportions
- Numeric — numerical metrics use methods for continuous quantities
- Ratio — composite metrics use Delta Method or linearization to correctly account for numerator and denominator variability
Resampling for Small Samples
For experiments with small sample size (< 20,000 users), the system may use Monte Carlo resampling. This method creates thousands of simulated samples from the original data for more reliable estimation of statistical indicators.
Advantages:
- More accurate confidence intervals on small samples
- Does not require normality assumptions
- Accounts for real data structure
Resampling can be enabled or disabled in analysis settings. By default, it activates automatically when sample is less than 20,000 users.
Analysis Settings
Before running analysis, you can configure:
- Significance level (α) — threshold for p-value (default 0.05)
- Resampling — enable/disable for small samples
- Show confidence intervals — display CI in results
- FDR correction — apply Benjamini-Hochberg correction to key metrics to protect against "random" significant results when analyzing a family of metrics (see more in FDR correction)
How to Read Results
P-value and Statistical Significance
P < α (highlighted in green): Result is statistically significant — the difference between groups is most likely not random. You can trust conclusions.
P ≥ α (gray): Result is not significant — there is insufficient evidence that the change had an effect. Perhaps a larger sample is needed, or the effect is absent.
Important: P-value shows result reliability but does not indicate effect size or importance. Significance level α is configured before analysis.
Confidence Interval (CI)
A range of values within which the true effect falls with given probability (for example, 95% with α = 0.05).
Interval does not include 0: Effect is statistically significant. For example, CI = [+2.1%, +5.3%] — all values are positive.
Interval includes 0: Effect is not significant. For example, CI = [-1.2%, +3.5%] — both positive and negative values are possible.
Interval width:
- Narrow interval = high estimation accuracy
- Wide interval = low accuracy, possibly insufficient sample
Relative Change
Percentage change relative to control group.
Example: Control = 5%, Test = 5.5%, Change = +10% (not +0.5%!)
Always look at relative change together with absolute and confidence interval for the full picture.
Creating Ratio Metrics
If the file has two numeric metrics (for example, revenue and orders), you can create a ratio metric directly in the analysis interface.
Process:
- Go to Ratio tab
- Click "Add ratio metric"
- Select numerator (for example,
revenue) - Select denominator (for example,
orders) - Specify name (for example,
average_order_value)
The system will calculate the metric with correct statistics accounting for variability of both components.
Examples of ratio metrics:
revenue / orders— average order valueclicks / impressions— CTRrevenue / sessions— revenue per session
Decision Making
Success:
- Effect is positive
- Statistically significant (p-value less than chosen α level)
- Sufficient from business perspective
Failure:
- Effect is absent or negative
- Or insufficient to justify resources
Ambiguous:
- Effect is close to significance boundary
- Contradictory results across different metrics
- Additional research needed
Always consider not only statistics but also business context, implementation cost, and risks.
