A way of measuring how closely two pieces of data resemble each other. This is helpful when dividing your data points into categories, creating clusters, or using the k-nearest neighbors algorithm.