CompleXimple Bits&Pieces Help   About   Sitemap   Random  
white 100_60
ul
ll

Diversity Number
ur
lr

white 100_60
Calculation of the Diversity Number di#
The diversity number di# is a simple descriptor for the distribution of elements in categories. It allows to assign one numeric value to the shape of a distribution in terms of “simple” or “complicated”, a property which is often intuitively clear, but hard to quantify.
If the sum of the elements is scaled to unity, di# = 1/Σpi2, where pi is the proportion of category i. If the sum is set to 100 %, di# = 10000/Σpi2, and for unscaled data, di# = (Σpi)2/Σ(pi2). It is close to 1 for a distribution with one major category, and it equals the number of categories if they are of the same size. For many categories with variable size, di# is between 1 and the number of categories. The square function causes di# to be dominated by the large categories, the small ones do not contribute much: Matthew-effect.
In a 1949 paper in Nature (163:688), E.H. Simpson proposed D = Σpi2 for the measurement of diversity. Simpson’s index can be understood as the probability of two randomly selected elements to be of the same category. The range is from 0 (high to infinite number of categories, no two elements are in the same one) to 1 (one category, always the same). For n categories of equal size, the probability becomes 1/n, e.g. 0.5 for 2 categories, 0.1 for 10, etc. The reciprocal value of Simpson’s index thus defines the number of equally common categories that will produce Simpson’s index, and is also (but not well) known as Hill’s N2 (M. O. Hill, Ecology 54:427 (1973)). To avoid scaling of the elements to the sum of unity, we multiply with (Σpi)2 and obtain the - numerically identical- diversity number di#. This is just another algorithm and another, more descriptive term for the same value.
Graphically, the (big) square of the sum vs. the sum of the squares (in grey) can be visualized: diversitysquares Some applications: The diversity number of species in the tropical rain forest is much higher than in a monoculture corn field; the diversity number of companies in the pharmaceutical market is higher than the one in computer operating systems; the diversity number of drug metabolites is low when there is one preferred reaction to one major product, and high if there are many similar metabolic pathways.
In Microsoft Excel, the diversity number di# of values e.g. in cells A1 to A20 can easily be calculated using the formula =SUM(A1:A20)^2/SUMSQ(A1:A20). Blank cells and zero values are tolerated and do not change the value of di#.
The diversity number di# is a powerful method to reduce the - possibly detailed, complicated, and interesting - information contained in a distribution to one single numeric value. This extreme data reduction might be regarded as a regrettable loss of valuable information. But if done properly and on purpose, it can help in dealing with huge amounts of data and to understand structures therein which are otherwise effectively hidden by too many details.

Diversity number calculator:
Enter (or edit) some numbers, separated by spaces, press the TAB key. Times five, to compare datasets.
input data       di#
input data       di#
input data       di#
input data       di#
input data       di#


links out Biodiversity & Information Theory, Applet Pisces Species Diversity and Richness Calculator
Tucker Balch behavioral diversity in Learning Robot Teams Tietjen's Species Diversity Calculator
Rarefaction calculator by John Brzustowski


http://www.christianhauck.net   christianhauck@christianhauck.net