りんだろぐ rindalog: モデル構築、評価 & 判定：Scoring 評価

「Classification 評価」からの続き。

本書 "5.2.2 Evaluating scoring models" をもとにした。

Evaluating models that assign scores can be a somewhat visual task. The main concept is looking at what is called the residuals or the difference between our predictions f(x[i,]) and actual outcomes y[i].

主なコンセプトは、所謂 residuals「残差」を調べること、つまり予測値と現実値との差をみること。

訓練データは x = {1, 2, ..., 10}, y = x² の黒点、このデータを y = ax + c で直線回帰したのが青線。

> d <- data.frame(y=(1:10)^2,x=1:10)
> model <- lm(y~x,data=d)
> d$prediction <- predict(model,newdata=d)
> library(ggplot2)
> ggplot(data=d) + geom_point(aes(x=x,y=y)) +
geom_line(aes(x=x,y=prediction),color="blue") +
geom_segment(aes(x=x,y=prediction,yend=y,xend=x)) + scale_y_continuous('')

注意：コメント文字は、R 処理ではなく、出力後の画像に付け足したもの。

RMSE: Root Mean Square Error

The most common goodness-of-fit measure is called root mean square error (RMSE). This is the square root of the average square of the difference between our prediction and actual values. Think of it as being like a standard deviation: how much your prediction is typically off.

goodness-of-fit「当てはまりの良さ」を測るのが RMSE、「予測値と実際値の差の二乗」の平均の平方根。予測の外れ具合を見る、標準偏差のようなもの。

先の直線回帰の RMSE は

> sqrt(mean((d$prediction-d$y)^2))
[1] 7.266361

RMSE と y 値の単位は同じなので、ビジネス上で RMSE を使う場合の例として

“We want the RMSE on account valuation to be under $1,000 per account.”

そのまま RMSE 値を使える。

R-squared

> 1 - sum((d$prediction-d$y)^2)/sum((mean(d$y)-d$y)^2)

[1] 0.9497645

1.0 minus how much unexplained variance your model leaves (measured relative to a null model of just using the average y as a prediction).

「モデルが説明していない割合」（予測値として y の平均を使った null model との割合）を、1.0 から引くことで「モデルが説明している割合」を算出。

分析の目標として、"We want the model to explain 70% of account value." など。

R-squared can be derived from RMSE plus a few facts about the data (so R-squared can be thought of as a normalized version of RMSE).

R-squared は「正規化された RMSE」とすることができる。

However, R-squared is not always the best business-oriented metric. For example, it’s hard to tell what a 10% reduction of RMSE would mean in relation to the Netflix Prize.

しかし、R-squared はビジネス上のモノサシとしては、必ずしも最適ではない。例えば、Netflix Prize で RMSE の 10% 低下に意味を説明するのは難しい。

次は Wikipedia の Netflix Prize からの引用。

Submitted predictions are scored against the true grades in terms of root mean squared error (RMSE), and the goal is to reduce this error as much as possible.

There was some controversy as to the choice of RMSE as the defining metric. Would a reduction of the RMSE by 10% really benefit the users? It has been claimed that even as small an improvement as 1% RMSE results in a significant difference in the ranking of the "top-10" most recommended movies for a user.

RMSE の使用に論争があった。10% RMSE の低下が利用者の利益になるのか？

Netflix Prize のことは知っていたが、今回少し突っ込んでいつかの記事を読んだが、かなり興味深い。日本企業によるこういった試みを聞いたことがないので、情報処理の分野での遅れを痛感する。そもそも英語による情報発信量が少なすぎる日本企業に、多くは望めない。

「Probability 評価」に続く。

りんだろぐ rindalog

2016年10月9日日曜日

モデル構築、評価 & 判定：Scoring 評価

0 件のコメント:

コメントを投稿