πŸ“ŽSampling Metrics

Data Sampling

"Not all data is equal"

Expert anecdote by Jennifer Prendki, founder and CEO of Alectio

If you care about your nutrition, you don’t go to the supermarket and randomly select items from the shelves. You might eventually get the nutrients you need by eating random items from supermarket shelves, but you will eat a lot of junk food in the process. I think it is weird that in machine learning, people still think it’s better to sample the supermarket randomly than figure out what they need and focus their efforts there.

Credits: Human in the Loop Machine Learning by Robert Munroby

The assumption is that some data points are more valuable for the model than others. The focus is on how to identify these valuable data points.

Uncertainty Sampling Techniques

  • Least Confidence Sampling

  • Marginal Confidence

  • Ratio Confidence

  • Entropy Confidence

Least Confidence Sampling

Least Confidence Sampling is the most common method for uncertainty sampling, which takes the difference between 100% confidence and the most confidently predicted label for each item. Least confidence is sensitive to the base used for the softmax algorithm. Least confidence sampling is in the range of 0-1 where 1 is most uncertain.

Marginal Confidence

The most intuitive form of uncertainty sampling is the difference between the two most confident predictions. Margin of confidence is less sensitive than least confidence sampling to the base used for the softmax algorithm, but it is still sensitive. Marginal confidence sampling in 0-1 range where 1 is most uncertain

Ratio Confidence

Ratio of confidence is a variation on margin of confidence, looking at the ratio between the top two scores instead of the difference. Ratio of confidence is invariant across any base used in softmax. Ratio of confidence in 0-1 range , where 1 is most uncertain.

Entropy Confidence

Entropy mesaures the information (surpise) element of the model . High entropy occurs when the probabilities are almost likely . So in our case hight enropy 1 means model is most confused.

Last updated