I begin this particular journey after reading parts of **Statistics Done Wrong: The Woefully Complete Guide** by Alex Reinhart (2015). It was interesting but too tough, so I sought out other books.

You can probably infer from my earlier post that I chanced upon **Statistics 101** by David Borman. One key idea I tried to understand was this:

Reinhart advises users of statistics to replace point estimates (p values) with confidence intervals (estimates of uncertainty).

This is because Reinhart felt: “misinterpreted p values cause numerous false positives.”

Gord Doctorow. (22 May 2015). Statistics Done Wrong: The Woefully Complete Guide. https://boingboing.net/2015/05/22/statistics-done-wrong-the-woe.html.

At my second or third reading of Borman, I gained more insight, yet it was far from enough so I did even more research. The below is my attempt at comprehending these two terms.

Confidence Interval

A Confidence Interval is a range of values we are fairly sure our true value lies in.

Confidence Intervals. (no date). https://www.mathsisfun.com/data/confidence-interval.html. MathsisFun.

The value from the sample (the specific term is *statistic*) can relate to a population parameter such as the mean (*average*) or relative frequency. Some suggest that the “ultimate goal of the field of statistics is to estimate a population parameter by use of sample statistics.” [Courtney Taylor. (24 Jun 2019). **Learn the Difference Between a Parameter and a Statistic**. ThoughtCo.]

Let’s consider the mean height of trees in *Country A*. If the sample achieves a 99% confidence interval, it means 99% of the data matches with the entire population. [99% of the data comes within 3 standard deviations under the bell curve/normal distribution; for 95% it is 2 standard deviations. Borman, p. 129.]

P-value Definition (with other definitions for clarity)

Hypothesis: A statement that might be true, which can then be tested.

Chi-Square Test. (no date). https://www.mathsisfun.com/data/chi-square-test.html. MathsisFun.

A p-value is

- “the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event.”
- “an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.”
- “calculated using p-value tables or spreadsheet/statistical software.”

Brian Beers. (26 Apr 2019). P-Value Definition. https://www.investopedia.com/terms/p/p-value.asp .

Applying the P-value

Because different researchers use different levels of significance when examining a question, a reader may sometimes have difficulty comparing results from two different tests…

The p-value approach to hypothesis testing uses the calculated probability to determine whether there is evidence to reject the null hypothesis. The null hypothesis, also known as the conjecture, is the initial claim about a population of statistics.

The alternative hypothesis states whether the population parameter differs from the value of the population parameter stated in the conjecture. In practice, the p-value, or critical value, is stated in advance to determine how the required value to reject the null hypothesis.

Brian Beers. (26 Apr 2019). P-Value Definition. https://www.investopedia.com/terms/p/p-value.asp.

How small of a p-value do we need in order to reject the null hypothesis? The answer to this is, “It depends.” A common rule of thumb is that the p-value must be less than or equal to 0.05, but there is nothing universal about this value.

Courtney Taylor. (18 May 2017). What Is a P-Value? https://www.thoughtco.com/what-is-a-p-value-3126392.

Why p<0.05 ?

It is just a choice! Using p<0.05 is common, but we could have chosen p<0.01 to be even more sure…

Chi-Square Test. (no date). https://www.mathsisfun.com/data/chi-square-test.html. MathsisFun.

The null hypothesis states a commonly held belief or premise which the researcher tests to see if they can reject it. The key point to grasp is that the researcher wants to always reject the null hypothesis and the P-test aids them in achieving this goal. Another point to note is that if the P-test fails to reject the null hypothesis then the test is deemed to be inconclusive and is in no way meant to be an affirmation of the null hypothesis.

Akhilesh Ganti. (1 Jun 2019). P-test. https://www.investopedia.com/terms/p/p-test.asp.

Example of P-value Testing

Assume an investor claims that their investment portfolio’s performance is equivalent to that of the Standard & Poor’s (S&P) 500 Index. In order to determine this, the investor conducts a two-tailed test. The null hypothesis states that the portfolio’s returns are equivalent to the S&P 500’s returns over a specified period, while the alternative hypothesis states that the portfolio’s returns and the S&P 500’s returns are not equivalent. If the investor conducted a one-tailed test, the alternative hypothesis would state that the portfolio’s returns are either less than or greater than the S&P 500’s returns.

One commonly used p-value is 0.05. If the investor concludes that the p-value is less than 0.05, there is strong evidence against the null hypothesis. As a result, the investor would reject the null hypothesis and accept the alternative hypothesis.

Conversely, if the p-value is greater than 0.05, that indicates that there is weak evidence against the conjecture, so the investor would fail to reject the null hypothesis. If the investor finds that the p-value is 0.001, there is strong evidence against the null hypothesis, and the portfolio’s returns and the S&P 500’s returns may not be equivalent.

Brian Beers. (26 Apr 2019). P-Value Definition. https://www.investopedia.com/terms/p/p-value.asp.

Other views on Reinhart’s book

- Statistics Done Wrong: The Woefully Complete Guide by Alex Reinhart. Sandra Henry-Stocker. Unix Dweeb, ITworld | APRIL 4, 2015. https://www.itworld.com/article/2906134/statistics-done-wrong-the-woefully-complete-guide-by-alex-reinhart.html.
- Ben Rothke. (8 Apr 2015). Statistics Done Wrong: The Woefully Complete Guide. RSA Conference. https://www.rsaconference.com/blogs/statistics-done-wrong-the-woefully-complete-guide.

Pingback: Sample Size – Learning Statistics | chenweilun2014

Pingback: Flaws: Confidence Interval and P-value | chenweilun2014