The value of data is dropping
I’ve been writing this blog since 2010. In that very first post, I quoted figures estimating the content of the Internet had reached 487 billion gigabytes. Ten years later, in 2020, the best estimate I can find is that 64 trillion GB of new data was created in that year alone. By 2023 this nearly doubled again to 120 trillion GB (or 120 zettabytes).
To put this into perspective, that means global data volumes are now growing at more than 13 billion gigabytes of new data every hour. It’s a good thing we don’t rely on paper as much as we used to, if we printed this on A4 pages with about 2,000 characters on every page, it would equate to about 7,000 trillion pages every hour! Assuming each page is 0.1mm thick, this stack of paper that would grow at a rate of 700 million km per hour!
A more serious consideration is the relationship between this data growth and the economic value that it is supporting. In 2010 the world’s GDP was about US$67 trillion. By 2020 the GDP reached US$85 trillion and, at the time of writing, the estimated GDP for 2023 was US$105 trillion. The 487 billion gigabytes I wrote about in 2010 was only a fraction of the 2 zettabytes that was estimated to be generated that year. Even taking the higher number, each GB could be associated with US$33 of GDP. Despite the intervening economic growth, the explosion in data by 2023 means that each GB is associated with less than US$1 of GDP.
My favourite definition of information was penned by Robert Losee from the University of North Carolina at Chapel Hill: “Information is produced by all processes and it is the values of characteristics in the processes’ output that are information.” Given that information is carried by data, this says that all data is created from processes.
While more data is captured from processes than ever before, simply collecting it without generating more value is a poor outcome. Either we should extract more value from every process through the data we are harvesting, or we should be enabling more value-adding processes.
Global shortages across the technology ecosystem, from semiconductors to data centre capacity, show that growth in digital activity is not free, not least from the perspective of the environment with data centre usage alone now drawing as much power from the grid as the UK or even Italy.
An example of a major driver of all this data growth is the entertainment and media industry, including everything from TikTok and Netflix to Disney and free to air networks. Even though entertainment is a multi-trillion dollar industry, its annual growth is much less than the growth in data volumes that it is generating. Rather than creating new value the high growth areas are simply eating legacy media and their revenue.
Arguably streamers are taking highly efficient processes (such as broadcast television) and replacing them with data intensive streaming often of the same content. While there were bursts of investment in high value material early in the disruption (when the likes of Netflix, Amazon and Apple invested in high production value programs which would otherwise never have been made), the industry seems to be settling into the same old economics with little that is genuinely new despite the burden it is putting on the Internet and data centres.
The revolutionary launch of ChatGPT, and with it the exponential growth of Generative AI, has added even more new data to the mix without yet showing dramatic productivity growth. The latest market volatility is a reminder from investors that simply scaling data without showing value is unsustainable from a business perspective. As I’ve written before, getting productivity from IT has always been a hard journey but will occur over time. However, adding orders of magnitude of data very rapidly makes the hurdle to payback even larger. Just because we have the capacity to move and store more data than ever before doesn’t mean that we shouldn’t insist that every byte is worth its load on the economy and environment.
It is too easy to regard data growth as an acceptable, ongoing and infinite trend. Current or implied policies in most countries of “net neutrality” mean that business models that create and move more data across the network are being subsidised by all of us through Internet Service Providers.
We have seen enormous opportunities emerge over the three decades that the Internet has been available to the general public. An important reason for its success has been ubiquitous connectivity that pays little heed to data volume and none to distance freeing innovators to find creative ways to connect people, services and content. But with global data creation and movement decoupling from the real economy we can’t afford for it to continue to grow without constraint.
An alternative approach is to start prioritising data growth based on value. An unbelievable amount of data is created, duplicated and simply moved around unnecessarily. With even a little extra tension in the system, unnecessary duplication or harvesting of data that is never used as well as running of inefficient code will gradually be removed from the systems and networks that surround us.
As with most previous periods of rapid technology, market and social change, following the revolution has to come a refocusing on optimisation which ideally provides better outcomes for our society and more value for everyone.