# Entropy Estimation

Last updated on 14 January 2012.

## Contents

Information theoretic (IT) methods are now routinely used for natural sciences data analysis. For example, in neuroscience one uses IT tools to understand mechanisms by which neurons encode sensory information, while in molecular systems biology the same techniques help to uncover statistical associations among concentrations of various molecules. The strength of the methods comes from the very limited assumptions that go into them. For example, it is appealing to quantify all statistical dependencies among variables, but not just linear ones. However, the widespread use of the IT methods has been hindered by a major difficulty: it is hard to estimate entropy from experimental datasets of realistic sizes. It turns out that estimators are typically biased. For example, for estimation of entropies of discrete data, most common methods result in a bias of ${\displaystyle \propto 2^{S}/N}$, where ${\displaystyle S}$ is the (unknown) entropy and ${\displaystyle N}$ is the data set size (see this review for details). In the recent years, many methods (some described here) have been developed to solve the problem, but the majority only works when ${\displaystyle 2^{S}/N\ll 1}$ (for discrete variables), or requires strong assumptions about the smoothness of the underlying probability density (for continuous variables).

Our goal has been to solve this problem, at least under certain weak assumptions. We've made some progress. In particular, for severely undersampled data, our NSB method is probably still the most powerful, ten years after we proposed it. For example, the plot to the left show the convergence of various entropy estimators to the true entropy value when the data set size grows. The NSB method clearly converges better.

## Results

### NSB entropy estimator

Our NSB entropy estimator reduced the entropy estimation bias to ${\displaystyle 2^{S/2}/N\ll 1}$ for a large class of data. The method doesn't work always, but it is possible to diagnose when it fails and, even in these cases, it performs not worse than most traditional estimators.

If you are interested in the NSB method, the following link will be of use:

### Differences of information quantities

In many cases, in particular when IT quantities are used as a measure of statistical interactions, the precise values of these quantities are less important than their rankings (i.e., the sign of their differences). In the case of mutual information, we showed in Margolin et al., 2006a that the latter task is much easier than the former, and it can be completed with almost any estimator with very little prior tuning.