Skip to main content

Simpson's Paradox

by Kate Armel (March 17, 2011)
Simpson's Paradox

Last week we looked at IT software productivity trends for 1000 completed IT systems and noted that average productivity has declined over the last 15 years.

The post sparked some interesting responses. Two readers wanted to know whether productivity actually increases over time for projects in the same size range? If so, this would be an illustration of Simpson's Paradox: a counterintuitive phenomenon we've seen from time to time in our own research. Simply put, sometimes the direction of a trend is reversed when the sample is broken into categories.

To answer their question, I used our SLIM-Metrics tool to stratify the sample into four size bins:

Under 5000 Effective (new + modified) SLOC
5000- <10000 Effective (new + modified) SLOC
10000-<20000 Effective (new + modified) SLOC
20000-<30000 Effective (new + modified) SLOC

These 4 size bins span a little over two thirds of the data. As a sanity check, I applied the same queries to both the original sample of 1000 IT projects and a larger sample of nearly 2200 IT projects. As the following chart shows, stratifying the data into size bins doesn't affect the overall direction of the trend:

Productivity over Time

For conventional productivity (FP/Person Month) the decline in productivity was even more pronounced:

FP per PM over time

I repeated this process for the next 4 size bins (30-40K ESLOC, 40-50, 50-60, 60-70) and so on, with the same results. The process is similar to analysis performed by Paul Below on the relationship between project size, team size, and conventional productivity (SLOC/Person Month).

So does this mean that larger project sizes cause teams to be more productive? Not necessarily - what we're seeing is a correlation between increasing project size and higher average productivity. Still, whether you're using productivity metrics to calibrate estimates or benchmark your completed projects, looking at the same metrics from a variety of perspectives will help you understand the context behind the data.

Comments: