The p-value is defined as the probability of obtaining a test statistic at least 'as extreme' as the value observed for the data at hand under the assumption that the null hypothesis is correct. (Recall, in EST's test statistics, the null hypothesis is that AR, CAR, AAR, or CAAR are equal to 0.)

Arguably, this definition is not easy to understand for users not versed in statistical theory and, over the years, has created lots of confusion. So we propose to instead focus on what a p-value essentially means: the amount of evidence contained in the data against the null hypothesis or, equivalently, in favor of the alternative hypothesis. A p-value is a number between zero and one and the smaller the number, the stronger the evidence. Common cut-off values are as follows: a p-value less than 0.1 means `somewhat of evidence', a p-value less than 0.05 means `solid evidence', and a p-values less than 0.01 means `very strong evidence'. Most researchers use the cut-off of 0.05 to determine whether there is evidence or not.

There is an important asymmetry that is missed by many users and even quite a few academic researchers: Whereas a small p-value constitutes evidence in favor of the alternative hypothesis, a large p-value (say a p-value of 0.6) does not constitute evidence in favor of the null hypothesis. In other words, a small p-value `proves' (beyond a reasonable doubt) that the alternative hypothesis is true whereas a large p-valued does not `prove' that the null hypothesis is true. All one can say in the latter case is that the null hypothesis is `plausible' or `not rejected' by the data.

An analogy might help to understand this asymmetry (better): a court case. In a court case, the null hypothesis plays the role of "the defendant is innocent" and the alternative hypothesis plays the role of "the defendant is guilty". During the court case, one looks at "data" in order to determine which hypothesis to go with in terms of the verdict. If there is strong evidence against the null, say in form of trustworthy testimony or crime-scene analysis, one arrives at the verdict of "guilty" and the defendant is sentenced. In this case, the guilt (that is, the alternative) is considered proven (beyond a reasonable doubt). On the other hand, in the absence of such evidence, one arrives at the verdict of "innocent" and the defendant is set free. But in this case, innocence is not necessarily considered proven. Perhaps there was some evidence but just not enough to arrive at a guilty verdict. So then if the defendant is set free (that is, one goes with the null hypothesis) one is not necessarily convinced of his/her innocence; a leading example is the O.J. Simpson murder case trial. Of course, there may be cases where an innocent verdict may go along with proven innocence (beyond a reasonable doubt), say if a trustworthy alibi can be produced; but such cases are not universal.