Metrics

This chapter, I am introducing the popular metrics for ML applications, as the following -

  • Classification - confusion matrix, accuracy, precision, recall, F1-score, ROC, AUC.
  • Regression - MSE, MAE, R squared.
  • Recommender system (learn to rank) - AP, mAP@k, nDCG.

Classification

  • Confusion matrix
    • True positive(TP): predict positive, actual positive
    • True negative(TN): predict negative, actual negative
    • False positive(FP): predict positive, actual negative
    • False negative(FN): predict negative, actual positive
  • Accuracy = (TP + TN) / (TP + TN + FP + FN)
  • Precision = TP / (TP + FP)
  • Recall (True positive rate) = TP / (TP + FN)
  • False positive rate = FP / (FP + TN)
  • F1-score = 2 * Precision * Recall / (Precision + Recall)
  • ROC : x - False positive rate, y - True positive rate, threshold - 0-1
  • AUC: Area under ROC, 1 - good, 0 - bad.

Regression

  • MSE = mean((y - y_pred)^2)
  • MAE = mean(abs(y-y_pred))
  • RSME = sqrt(MSE)
  • R_squared = 1 - SSR/SST = 1 - sum((y - y_pred)^2)/sum((y - y_avg)^2)

Learn to Rank (Recommender systems)

Learn to rank is to predict the rank (order) of relevant items for a given task.

  • Mean reciprocal rank (MRR)

    Average of the reciprocal ranks of “the first relevant item” for a set of queries. MRR = mean(1/rank).

  • Precision @ k :

    Number of relevant items among the top k items.

    • P@k = # relevant items / # top k items
    • AP@N = 1/n * sum(P@k)
    • mAP@N = mean(AP@N)

    Example:

    true_items = {"a", "b", "c", "d", "e", "k"}
    predict_items = ["a", "f", "d", "e", "g"]
    relevant_list = [1, 0, 1, 1, 0]
    AP@N = 1/len(true_items) * (1/1 + 0/2 + 2/3 + 3/4 + 0/5)
    or
    AP@N = 1/sum(relevant_list) * (1/1 + 0/2 + 2/3 + 3/4 + 0/5)
    
  • Normalized Discounted Cumulative Gain (NDCG)

    • Cumulative Gain : Sum of all relevance values in a search result list, sum(rel_i).
    • Discounted Cumulative Gain : sum(rel_i / log2(i+1)).

    Example:

    true_items = ["a", "b", "c", "d", "e", "k"]
    relevant_scores = [6, 5, 4, 3, 2, 1]
    predict_items = ["a", "f", "d", "e", "g"]
    relevant_list = [6, 0, 3, 2, 0]
    DCG = 6/1 + 0 + 3/2 + 2/2.32 + 0
    ideal_relevant_list = [6, 3, 2, 0, 0]
    IDCG = 6/1 + 3/1.59 + 2/2 + 0 + 0
    NDCG = DCG / IDCG
    

Reference

  • https://towardsdatascience.com/20-popular-machine-learning-metrics-part-2-ranking-statistical-metrics-22c3e5a937b6
  • http://sdsawtelle.github.io/blog/output/mean-average-precision-MAP-for-recommender-systems.html
  • https://machinelearningmedium.com/2017/07/24/discounted-cumulative-gain/
  • https://sigir.org/wp-content/uploads/2017/06/p243.pdf
  • https://gist.github.com/tgsmith61591/d8aa96ac7c74c24b33e4b0cb967ca519