りんだろぐ rindalog: モデル構築、評価 & 判定：評価のモノサシ

「Model Validation」からの続き。

"5.2. Evaluating models" をもとにした。

分析モデルを構築して最初にすることが、evaluating models「モデル評価」で、訓練データを使用して行う。モデルの性能を「数値化したモノサシ」で測る。

For most model evaluations, we just want to compute one or two summary scores that tell us if the model is effective. To decide if a given score is high or low, we have to appeal to a few ideal models: a null model (which tells us what low performance looks like), a Bayes rate model (which tells us what high performance looks like), and the best single-variable model (which tells us what a simple model can achieve).

大抵のモデル評価において、効果的なモデルか否かを示す summary scores を算出する。採点が高いか低いかを判断するために、次の ideal models が必要になる、null model, Bayes rate model, そして the best single-variable model 。

次は、表 5.3 Ideal models to calibrate against から、要点を取り上げたもの。日本語訳は意訳。

Problem	Description
Null model	A null model is the best model of a very simple form you’re trying to outperform. null model とは単純なモデルで、あなたの構築したモデルが一番最初に勝たなければならないモデル。 We use null models to lower-bound desired performance, so we usually compare to a best null model. For example, in a categorical problem, the null model would always return the most popular category. null models は低レベルの欲求を満たすモデル。分類問題で例えれば、最も人気のある分類を常に返すだけのモデル。 The idea is this: if you’re not out-performing the null model, you’re not delivering value. null model に勝てないモデルは、何の価値も提供しないモデル。
Bayes rate model	A Bayes rate model (also sometimes called a saturated model) is a best possible model given the data at hand. Bayes rate model は、与えられたデータで「最もあり得る」モデル。 If we feel our model is performing significantly above the null model rate and is approaching the Bayes rate, then we can stop tuning. null model の性能を十分に上回り、且つ Bayes rate に近づいているなら、モデルの性能チューニングは停止できる。
Single-variable models	A complicated model can’t be justified if it doesn’t outperform the best single-variable model available from your training data. 訓練データによる最良の single-variable model に勝らなければ、複雑なモデルは正当化されれない。

In this section, we’ll present the standard measures of model quality, which are useful in model construction. In all cases, we suggest that in addition to the standard model quality assessments you try to design your own custom “business-oriented loss function” with your project sponsor or client.

本節で、モデル性能を測る標準的なモノサシを取り上げる。それらは、モデル構築にいても有用。標準的なモデル性能評価に加えて、プロジェクトのスポンサーや顧客と共に、カスタムな "business-oriented loss function" をデザインすることをお勧めする。

評価の視点から、モデルのタイプを以下のように分ける。

Classification
Scoring
Probability estimation
Ranking
Clustering

全てのモデルタイプというより、特徴的で具体的な評価方法を取り上げる予定。

「Classification 評価」に続く。

りんだろぐ rindalog

2016年10月7日金曜日

モデル構築、評価 & 判定：評価のモノサシ

0 件のコメント:

コメントを投稿