りんだろぐ rindalog: ベイズ的検定：ROPE と HDI

本投稿は John Kruschke 著 Doing Bayesian Data Analysis の第 12 章 "Bayesian Approaches to Testing a Point ("Null") Hypothesis" をもとにした。

Doing Bayesian Data Analysis, Second Edition: A Tutorial with R, JAGS, and Stan

John Kruschke

Academic Press
Amazon.co.jpで詳細を見る

本章についての投稿が目次にないのは、この章を流し読みしたり、読まなかったりしたから。今になって読むととても興味深い内容で、当時の自分の理解度のなさを痛感してしまう。

データ分析で求める結論を、例えば

効果があったどうか？コインに偏りはあるか？グループに違いはあるか？

「どちらか？」とする。

前章の NHST、所謂「帰無仮説検定」を、ここでは Bayesian のアプローチで行う。通常ならば、本書 "12.1.1 Region of practical equivalence" なのだが、ここでは、本著者のブログ記事 "How much of a Bayesian posterior distribution falls inside a region of practical equivalence (ROPE)" を元にする。

ここで書いた内容で、HDI と ROPE の基本は網羅できていると思う。

How much of a Bayesian posterior distribution falls inside a region of practical equivalence (ROPE)

ベイジアン事後確率の ROPE への納まり度合

The posterior distribution of a parameter shows explicitly the relative credibility of the parameter values, given the data. But sometimes people want to make a yes/no decision about whether a particular parameter value is credible, such as the "null" values of 0.0 difference or 0.50 chance probability. For making decisions about null values, I advocate using a region of practical equivalence (ROPE) along with a posterior highest density interval (HDI). The null value is declared to be rejected if the (95%, say) HDI falls completely outside the ROPE, and the null value is declared to be accepted (for practical purposes) if the 95% HDI falls completely inside the ROPE. This decision rule accepts the null value only when the posterior estimate is precise enough to fall within a ROPE. The decision rule rejects the null only when the posterior exceeds the buffer provided by the ROPE, which protects against hyperinflated false alarms in sequential testing. And the decision rule is intuitive: You can graph the posterior, its HDI, its ROPE, and literally see what it all means in terms of the meaningful parameter being estimated. (For more details about the decision rule, its predecessors in the literature, and alternative approaches, see the article on this linked web page.)

HDI については「Bayesian HDI」を参照。ROPE は region of practical equivalence「実質的同値域」で、コインの偏りの検定では、ROPE は例えば 0.5 を中心として [0.45, 0.55] や [0.40, 0.60] とする。

上記引用で、黄色で強調した箇所が null value の検定方法で

　拒否：HDI が ROPE の外（HDI が ROPE と全く重ならない）
　受入：HDI が ROPE の内（HDI が ROPE に完全に含まれる）

null value とは、例えばコインの偏り検定では 0.5 。

HDI が ROPE に「完全に含まれる」ということは、事後確率は実質的に ROPE で想定したものとなる。よって「実質的同値」である null value 、例えば 0.5 を受け入れることに相当する。当然、極端に ROPE の範囲を広げれば HDI を含み易くなるが、逆により狭い ROPE で HDI を含むほど結果の信頼度は高まる。

では「ROPE と HDI の一部分が重なった」場合は？この「第三のケース」については、本書から引用する。

We withhold a decision. This means merely that the current data are insufficient to yield a clear decision one way or the other, according to the stated decision criteria.

結論を保留する。決定基準に従い、明確な結論を出すにはデータ不十分。

では、ROPE の範囲は決めかたはどうするか？

But how do we set the limits of the ROPE? How big is "practically equivalent to the null value"? There is typically no uniquely correct answer to this question, because it depends on the particular domain of application and the current practice in that domain. But the rule does not need a uniquely correct ROPE, it merely needs a reasonable ROPE that is justifiable to the audience of the analysis. As a field matures, the limits of practical equivalence may change, and therefore what may be most useful to an audience is a description of how much of the posterior falls inside or outside the ROPE as a function of the width of the ROPE. The purpose of this blog post is to provide some examples and an R program for doing that.

一概には決まらない。というのも、分析の対象や目的に依存するから。ただ、方針としては、分析結果を読む人が正当とみなせる ROPE であること。そして、分析が進めば ROPE も変わっていく。便利な指標としては、「事後確率が ROPE の内側や外側にどの程度あるか」。

この指標をプロットする R スクリプトが記事には掲載されている。

Consider a situation in which we want to estimate an effect size parameter, which I'll call δ. Suppose we collect a lot of data so that we have a very precise estimate, and the posterior distribution shows that the effect size is very nearly zero, as plotted in the left panel below:

以下のようにパラメータ δ を推定。左図のように、分布はほぼゼロを表している。

We see that zero is among the 95% most credible values of the effect size, but we can ask whether we should decide to "accept" the value zero (for practical purposes). If we establish a ROPE around zero, from -0.1 to +0.1 as shown above, then we see that the 95% HDI falls entirely within the ROPE and we accept zero. What if we think a different ROPE is appropriate, either now or in the future? Simply display how much of the posterior distribution falls inside the ROPE, as a function of the ROPE size, as shown in the right panel above. The curve plots how much of the posterior distribution falls inside a ROPE centered at the null value, as a function of the radius (i.e., half width) of the ROPE. As a landmark, the plot shows dashed lines at the 95% HDI limit farthest from the null value. From the plot, readers can decide for themselves.

ROPE の範囲が -0.1, +0.1、95％HDI は ROPE に完全に含まれているので、null value のゼロを受入れる。では、仮に ROPE の範囲を変えるとどうなるか？右図が、ROPE のサイズに沿って、事後確率が ROPE に含まれる割合を表したもの。

右図のチャートについて補足する。

本チャートは、ROPE の拡大（x 軸）に沿って、事後確率が ROPE に含まれる割合（y 軸）の変化を表す。x 軸は ROPE の半分という意味の radius で、ROPE の中央値 null value からの距離。x 軸が ROPE の右半分か左半分かは、「HDI が偏っている方」、つまり "95% HDI limit farthest from the null value" 。この例の the null value は 0 。そして、95%HDI の右端 0.0756 は、x 軸で "95% HDI limit farthest from 0" の赤点線が指す 0.0756 。

よって、この例では、ROPE の範囲を -0.0756, +0.0765 にしたのが赤点線で、y 軸で示された 0.975 は、その ROPE が事後確率を含む割合。1.00 でないのは、この事後確率の 0.0756 より右側は ROPE に含まれないから。

null value の受入れを補完する情報として、右図のチャートを使うと次のようになる。

95%HDI[-0.0161, 0.0756] を完全に含む ROPE[-0.0756, 0.0756] は、事後確率を 97.5% 含む。よって、仮説の ROPE[-1, 1] よりも狭い範囲で受入れできるので、ROPE[-1, 1] で検定で受入れするのは妥当。

次の例は、コインの偏り θ を推定したもの。

Here is another example. Suppose we are spinning a coin (or doing something more interesting that has dichotomous outcomes) and we want to estimate its underlying probability of heads, denoted θ. Suppose we collect a lot of data so that we have a precise estimate, as shown in the left panel below.

We see that the chance value of 0.50 is among the 95% most credible values of the underlying probability of heads, but we can ask whether we should decide to "accept" the value 0.50. If we establish a ROPE around zero, from 0.49 to 0.51 as shown above, we can see that the 95% HDI falls entirely within the ROPE and we would decide to accept 0.50. But we can provide more information by displaying the amount of the posterior distribution inside a ROPE centered on the null value as a function of the width of the ROPE. The right panel, above, allows the readers to decide for themselves.

ROPE の範囲は 0.49, 0.51、95%HDI[0.491, 0.509] は完全に ROPE に含まれているので、null value 0.50 は受入として良いだろう。この結果に追加する情報として右図がある。

右図チャートで受入れの判断を補完する。

95%HDI[0.491, 0.509] を完全に含む ROPE[0.491, 0.509] で、事後確率の 95.1% を含む。よって、仮説の ROPE[0.49, 0.51] よりも狭いので、受入は妥当。

最後に、棄却する例。

The plots can be used for cases of rejecting the null, too. Suppose we spin a coin and the estimate shows a value apparently not at chance, as shown in the left panel below.

Should we reject the null value? The 95% HDI falls entirely outside the ROPE from 0.49 to 0.51, which means that the 95% most credible values are all not practically equivalent to 0.5. But what if we used a different ROPE? The posterior distribution, along with the explicit marking of the 95% HDI limits, is all we really need to see what the biggest ROPE could be and still let us reject the null. If we want more detailed information, the panel on the right, above, reveals the answer --- although we need to mentally subtract from 1.0 to get the posterior area not in the ROPE (on either side).

ROPE の範囲 0.49, 0.51 に 95%HDI[0.513, 0.587] は完全に含まれない。つまり、95% で信頼できる値が、「実質的 0.5」である ROPE とは完全に異なっていることを意味するので、null value 0.5 は棄却される。

右図のチャートから棄却の判断を検討。

95%HDI[0.513, 0.587] を完全に含む ROPE の範囲は [0.413 (= 0.50 - 0.087), 0.587] で、その場合に事後確率を 97.5% 含む。よって、この ROPE [0.413, 0.587] から、ROPE[0.49, 0.51] とした仮説は否定されるべき。

実務に向けて

本投稿の冒頭で「HDI と ROPE の基本は網羅できていると思う」と書いたが、現実の運用で悩むところは、やはり ROPE の範囲決め。記事での指摘の通り、目的や状況に応じたものになる。例えば、ROPE の範囲で期待値を算出、最大損失を加味、などのやり方はありそうだ。

この辺は、実務やビジネス上の問題で、データ分析とは関係がない。とはいえ、逆に、そんなビジネス上の問題や解決の具体的目的がなければ、データ分析する意義はないともいえる。分析手法にばかりを注目して、この点が意外と見落とされがちのような気がする。

「差を見ること」に続く。

りんだろぐ rindalog

2016年9月11日日曜日

ベイズ的検定：ROPE と HDI

0 件のコメント:

コメントを投稿