Shapiro-Wilk test

Shapiro-Wilk Test

The **Shapiro-Wilk test** is a statistical test used to determine if a sample of data was drawn from a normally distributed population. It is one of the most powerful tests for normality, particularly for smaller sample sizes (n < 50), though it performs well across a broader range of sample sizes. This article provides a comprehensive introduction to the Shapiro-Wilk test, covering its history, underlying principles, calculation, interpretation, assumptions, limitations, and practical applications, especially within the context of analyzing data used in Technical Analysis.

History and Development

The test was developed by Samuel Shapiro and Martin Wilk in 1965. Prior to their work, tests for normality, such as the Kolmogorov-Smirnov test and the Anderson-Darling test, were commonly used. However, Shapiro and Wilk demonstrated that their test had greater power—meaning a higher probability of correctly rejecting the null hypothesis when it is false—especially for smaller samples. Their test built upon earlier work by Alexander Aitken, who developed a method for calculating the best normal order statistics for testing normality. The original paper detailing the test is "An Analysis of Variance Test for Normality" published in the *Biometrika* journal.

Underlying Principles

The Shapiro-Wilk test is a goodness-of-fit test. It assesses how well a dataset aligns with the characteristics of a normal distribution. The core idea is to compare the correlation between the ordered statistics of the sample and the expected order statistics of a normal distribution with the same sample size.

**Ordered Statistics:** When a dataset is sorted from smallest to largest, the resulting values are called ordered statistics. For a sample of size *n*, these are denoted as *x₍₁₎, x₍₂₎, ..., x_(n)*, where *x₍₁₎* is the smallest value and *x_(n)* is the largest.
**Normal Order Statistics:** These are the expected values of the ordered statistics if the data were truly drawn from a normal distribution.
**Correlation Coefficient:** The test calculates the correlation coefficient (*W*) between the observed ordered statistics and the expected normal order statistics. If the sample data are normally distributed, the correlation will be close to 1. Departures from normality will result in a *W* value further from 1.

How the Test is Calculated

The calculation of the Shapiro-Wilk test statistic *W* is complex and typically performed using statistical software. Here’s a breakdown of the steps involved:

1. **Sort the Data:** Arrange the sample data in ascending order. 2. **Calculate the Mean and Standard Deviation:** Compute the sample mean (x̄) and sample standard deviation (s) of the data. 3. **Determine the Coefficients (a_i):** This is the most computationally intensive part. A set of coefficients *a_i* are calculated, which are specific to the sample size (*n*) and represent the weights assigned to each ordered statistic. These coefficients are chosen to minimize the power of the test against specific alternative distributions. The coefficients are designed to give the test maximum power against deviations from normality in the tails of the distribution. Tables of these coefficients are available in statistical texts and are pre-programmed into statistical software. 4. **Calculate the Weighted Sum:** Compute the sum of the differences between each ordered statistic and the sample mean, weighted by the corresponding coefficient *a_i*. This is represented as:

  ```
  b = Σ_i=1ⁿ a_i(x_(i) - x̄)
  ```

5. **Calculate the Test Statistic (W):** The Shapiro-Wilk test statistic *W* is calculated as the square of the weighted sum *b*, divided by the total sum of squares:

  ```
  W = b² / Σ_i=1ⁿ (x_(i) - x̄)²
  ```
  The *W* statistic ranges from 0 to 1. A value of 1 indicates perfect normality.

Interpreting the Results

The Shapiro-Wilk test provides a *p*-value. The *p*-value represents the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data, *assuming that the data are normally distributed* (this is the null hypothesis).

**Null Hypothesis (H₀):** The data are normally distributed.
**Alternative Hypothesis (H₁):** The data are not normally distributed.

The decision rule is as follows:

**If p ≤ α:** Reject the null hypothesis. This indicates that the data are likely *not* normally distributed. Here, α is the significance level, typically set to 0.05.
**If p > α:** Fail to reject the null hypothesis. This does *not* mean that the data are definitely normally distributed, only that there is not enough evidence to conclude that they are not.

- Example:** If the Shapiro-Wilk test yields a *p*-value of 0.02 and the significance level (α) is 0.05, we would reject the null hypothesis and conclude that the data are significantly different from a normal distribution.

Assumptions of the Shapiro-Wilk Test

While robust, the Shapiro-Wilk test has certain assumptions:

**Independence:** The data points must be independent of each other. This is a fundamental assumption of many statistical tests.
**Random Sampling:** The sample should be a random sample from the population.
**Data Type:** The data should be continuous (or at least approximately continuous). The test is not appropriate for discrete data with a limited number of values.
**Sample Size:** While effective for small samples, the power of the test decreases with very large sample sizes. For very large samples (n > 5000), other tests like the Anderson-Darling test might be preferred.

Limitations of the Shapiro-Wilk Test

**Sensitivity to Outliers:** The test can be sensitive to outliers, which can distort the results. Consider investigating and addressing outliers before applying the test. Outlier Detection techniques are crucial in this context.
**Power Issues with Large Samples:** As mentioned earlier, the test's power diminishes with very large samples. It may fail to reject the null hypothesis even when the data are not perfectly normal.
**Doesn't Indicate *How* Non-Normal:** The test only tells you if the data are non-normal; it doesn't provide information about the nature of the non-normality (e.g., skewness, kurtosis). Further analysis, such as examining Histograms and Q-Q Plots, is needed to understand the deviations from normality.
**Not a Proof of Normality:** Failing to reject the null hypothesis does not *prove* that the data are normally distributed. It simply indicates that there isn't enough evidence to reject normality.

Practical Applications in Financial Markets and Technical Analysis

Many statistical and analytical techniques used in Financial Markets and Trading Strategies assume that data are normally distributed. The Shapiro-Wilk test is vital for verifying this assumption. Here are some specific applications:

**Volatility Modeling:** Models like the Black-Scholes model rely on the assumption of normally distributed price changes. Before applying these models, it's crucial to test the normality of the underlying asset's returns using the Shapiro-Wilk test.
**Risk Management:** Value at Risk (VaR) calculations often assume a normal distribution of returns. If the returns are not normally distributed, VaR estimates may be inaccurate.
**Statistical Arbitrage:** Strategies based on mean reversion or other statistical anomalies often require normally distributed data to ensure the validity of the statistical inferences.
**Backtesting Trading Strategies:** When Backtesting a trading strategy, it’s important to verify the normality of the strategy’s returns. Non-normality can invalidate statistical significance tests used to evaluate the strategy’s performance. Monte Carlo Simulation can be used to assess the robustness of a strategy under different distribution assumptions.
**Indicator Analysis:** Assessing the distribution of values generated by technical indicators like Moving Averages, Relative Strength Index (RSI), MACD, Bollinger Bands, Fibonacci Retracements, Ichimoku Cloud, Parabolic SAR, Stochastic Oscillator, Average True Range (ATR), Commodity Channel Index (CCI), Donchian Channels, and Elliott Wave Theory can help understand their behavior and potential applications.
**Trend Analysis:** Evaluating the normality of price changes can provide insights into the nature of a Market Trend. For example, a non-normal distribution might indicate the presence of fat tails, suggesting a higher probability of extreme price movements. Understanding Support and Resistance levels often relies on statistical distributions.
**Correlation Analysis:** Before calculating correlations between assets, ensure that the returns of those assets are approximately normally distributed. Non-normality can distort correlation coefficients. Regression Analysis also benefits from normally distributed residuals.
**Time Series Analysis:** Many time series models, such as ARIMA models, assume normally distributed residuals. The Shapiro-Wilk test can be used to check this assumption. Candlestick Patterns can be further analyzed with normally distributed data.
**Portfolio Optimization:** Techniques like the Markowitz portfolio optimization model assume normally distributed asset returns.

Alternatives to the Shapiro-Wilk Test

If the Shapiro-Wilk test is not appropriate for a particular dataset (e.g., due to very large sample size or discrete data), other tests for normality can be considered:

**Kolmogorov-Smirnov Test:** A non-parametric test that compares the empirical cumulative distribution function of the sample to the cumulative distribution function of a normal distribution.
**Anderson-Darling Test:** Similar to the Kolmogorov-Smirnov test, but gives more weight to the tails of the distribution.
**Lilliefors Test:** A modification of the Kolmogorov-Smirnov test specifically designed for testing normality when the mean and variance are unknown.
**Jarque-Bera Test:** A test based on skewness and kurtosis, which are measures of the shape of the distribution.
**Visual Inspection:** Q-Q Plots and Histograms provide a visual assessment of normality and can be helpful in identifying deviations from normality.

Software Implementation

The Shapiro-Wilk test is readily available in most statistical software packages:

**R:** `shapiro.test(data)`
**Python (SciPy):** `scipy.stats.shapiro(data)`
**SPSS:** Analyze > Descriptive Statistics > Explore > Normality Plots with Tests
**Excel:** Requires add-ins or custom formulas.

Statistical Significance Hypothesis Testing Data Analysis Probability Distribution Normal Distribution Skewness Kurtosis Data Visualization Statistical Software Trading Psychology

Start Trading Now

Sign up at IQ Option (Minimum deposit $10) Open an account at Pocket Option (Minimum deposit $5)

Join Our Community

Subscribe to our Telegram channel @strategybin to receive: ✓ Daily trading signals ✓ Exclusive strategy analysis ✓ Market trend alerts ✓ Educational materials for beginners