りんだろぐ rindalog: Bayesianで β 分布を使う訳

「Bayesianにおけるベルヌーイ分布」からの続き。

この投稿では、数学的な解説は抜きに、実践的なことに重きをおいてきた。理由は、私が「アカデミックではない」、そして何と言っても「上手く説明できない = 数学的な理解が低い」だった。

「ベイズの基礎」で使ったテキストが「初級向け」、本書は「中級向け」なので、同じ内容でも「ベイズの基礎」よりは突っ込んだものにしたいと思った。というか「すべきだろう」ということで、「ベイズの基礎」でも取り上げた「ベータ分布」をここで再度取り上げる。

「なぜベータ分布なのか？」「ベータ分布の特徴」など、前回よりは詳しく書いた。数式が不明なら無視しても、本文だけで理解できるように努めた。とはいえ、理解が深まりそうな数式を取り上げたので、理解の助けになればと思う。

ベイズルールの２要素

本書の内容を要約を記述するが、原文の方がオススメなので同時に引用した。

1. 事後確率、尤度、事後確率は同じ関数

ベイズルールの分子である p(y | θ) と p(θ) の積は p(θ) と同じ関数になる。したがって、事後確率も事前確率 p(θ) と同じ関数になる。

First, it would be convenient if the product of p(y | θ) and p(θ), which is in the numerator of Bayes’ rule, results in a function of the same form as p(θ). When this is the case, the prior and posterior beliefs are described using the same form of function. This quality allows us to include subsequent additional data and derive another posterior distribution, again of the same form as the prior. Therefore, no matter how much data we include, we always get a posterior of the same functional form.

2. p(θ) is a conjugate prior：結合事前確率

事前確率は特定の尤度関数のみ conjugate（結合）する。

引用文にある「ベイズルールの分母」

∫ dθp(y|θ) p(θ)

のように（θ が x）、本書ではベイズルールを左のように記述する（連続型です）。

Second, we desire the denominator of Bayes’ rule, namely ∫ dθp(y|θ)p(θ), to be solvable analytically. This quality also depends on how the form of the function p(θ) relates to the form of the function p(y|θ). When the forms of p(y|θ) and p(θ) combine so that the posterior distribution has the same form as the prior distribution, then p(θ) is called a conjugate prior for p(y|θ). Notice that the prior is conjugate only with respect to a particular likelihood function.

ベータ関数登場

前回取り上げたベルヌーイ尤度関数（Bernoulli likelihood function）p(y | θ) = θ^y(1 - θ)^(1-y) と conjugate する事前確率 p(θ) を検討する。

仮に事前確率が p(θ) = θ^a(1 - θ)^b なら、ベルヌーイ尤度関数との積 p(y | θ)p(θ) は θ⁽^y+a⁾(1 - θ)^(1-y+b) になる。よって求める事前確率の確率密度関数 p(θ) は、ベータ分布（beta distribution）

p(θ | a, b) = beta(θ ; a, b) = θ⁽^a^{- 1)} (1 - θ)^{(b - 1)} / B(a, b)

分母の B(a, b) は「normalizing constant 基準定数」で、密度を 1.0 にする確率密度関数に必須の処理。つまり

B(a, b) = ∫ ₀¹dθθ^(a-1)(1 - θ)^(b-1)

注意すべきは、ベータ関数（beta function) B(a, b) とベータ分布（beta distribution) beta(θ; a, b) を区別すること。ベータ関数は θ の関数ではない。また、R の関数においても、beta(θ; a, b) は dbeta(θ,a,b) 、B(a, b) は beta(a,b) と区別される。

補足：B(a, b) = ∫ ₀¹dθθ^(a-1)(1 - θ)^(b-1) であるが、B(a,b) = Γ(a)Γ(b) / Γ(a + b) とも表現される。Γ はΓ関数（Gamma function） ∫ ₀^∞dt t^(a-1)exp(-t) のこと。

ベータ分布を事前確率（beta prior）

ベータ分布 beta(θ; a,b) の平均は「θ の平均」= a / (a + b)、

標準偏差は左式（末尾の「注意」参考）。標準偏差は分母の a + b が大きいと小さくなる関係がわかる。

この a, b が観測データに相当する「表が a 回、裏が b 回」というように。事前確率が不明の場合は「a = 1, b = 1」となり、ベータ分布は一様分布となる。

a, b の平均 m = a / (a+b) から a, b を決めることができる。つまり、表の確率が m, 「n = a + b」回の試行を仮定すれば

a = mn, b = (1 - m) n ...(5.5)

この m は事前確率 θ（表の確率）に相当する。θ に強い確信があれば n を大きくする、例えば、m = 0.5, n = 8, a = 4, b = 4 のように。弱い確信であれば n = 4, a = 2, b = 2 のようになる。

注意：Wikipedia によれば、ベータ分布の分散は左のように表記されて、本書のもとのとは違う。理由が判明次第、ここに追記する。

事後確率の算出

本書はこの後「5.2.2 The posterior beta」に続くのだが、内容的に詳細だが「ベイズの基礎：β分布による推定」と同様なので概要だけを取り上げる。

事後確率を、事前確率のベータ関数とベルヌーイ尤度関数の積で求める。

N 回の試行で z 回表が出た場合、ベイズルール、事前確率、尤度は

　ベイズルール：p(θ | z,N) = p(z,N | θ)p(θ) / p(z,N)
　事前確率：beta(θ; a, b) = θ^{(a - 1)} (1 - θ)^{(b - 1)} / B(a, b)
　尤度　　：θ^z(1 - θ)^(N-z)

これら事前確率と尤度の積の一部は

　θ^z(1 - θ)^{(N - z)} θ^{(a - 1)} (1 - θ)^{(b - 1)} = θ^{((z + a) - 1)}(1 - θ)^{((N - z + b) - 1)}

よって

　beta(θ | z + a, N - z + b) = beta(θ; z + a, N - z + b)

"HDI の役割（確率分布の要約、ROPE）" に続く。

/*tex */

りんだろぐ rindalog

2015年10月20日火曜日

Bayesianで β 分布を使う訳

0 件のコメント:

コメントを投稿