Post by Dr Chris Lloyd
In conjunction with Vera Pawlowsky-Glahn and Juan José Egozcue, two internationally-renowned researchers in compositional data analysis (CODA), I have just published a paper in Annals of the Association of American Geographers (one of the top ranking geographical journals in the world), which shows why percentages cannot be properly analysed with standard statistical methods. Percentages are common in the spatial sciences, but it is not often realised that there are restrictions on how they can be analysed. An obvious example is a regression of one percentage against another – in this case the fitted model may predict values which are smaller than 0 or are larger than 100.
Percentages and proportions are referred to as compositional data and complete compositions typically sum to 100 (the case for percentages) or one (proportions). The AAAG article focuses on population studies and, using the example of religion in Northern Ireland in 2001, it shows how population data can be transformed into log-ratios; these new data can then be analysed using standard statistical approaches. The case study gives insights into how the population of Northern Ireland is distributed by religion and it shows that the most obvious geographical pattern relates to the ratio of Catholics to Protestants, although there are also distinct relationships between Catholics and individual Protestant denominations. Whether you are a Physical or Human Geographer, or indeed from an entirely different discipline, percentages or proportions should be used with caution and I hope that this paper helps some researchers make more informed choices.