Which countries are the top data producers? After all, with data-fueled applications of artificial intelligence projected, by McKinsey, to generate $13 trillion in new global economic activity by 2030, this could determine the next world order, much like the role that oil production has played in creating economic power players in the preceding century.
While China and the U.S. could emerge as two AI superpowers, data sources can’t be limited to concentrations in a few places as we have with an oil-driven economy — it needs to be drawn from many, diverse sources and future AI applications will emerge from new and unexpected players. The new world order taking shape is likely to be more complex than a simple bi-polar structure, especially since data is being produced at a pace that boggles the mind.
Building on our past work mapping the digital evolution and digital competitiveness of different countries around the world, we wanted to try to locate the deepest and widest pools of useful data. This is essential to run the myriad machine learning models critical to AI. To do so, it is useful to make a distinction between the raw volume of data and a measure that we shall call “gross data product” – our version of the new GDP. To identify the world’s top “gross data product” producers, we propose using four criteria:
- Volume: Absolute amount of broadband consumed by a country, as a proxy for the raw data generated.
- Usage: Number of users active on the internet, as a proxy for the breadth of usage behaviors, needs and contexts.
- Accessibility: Institutional openness to data flows as a way to assess whether the data generated in a country permits wider usability and accessibility by multiple AI researchers, innovators, and applications.
- Complexity: Volume of broadband consumption per capita, as a proxy for the sophistication and complexity of digital activity.
There are several nuances to note. For one, we recognize that the digital trace that is generated by computers around the world spans a very wide range of activities, from sending an SMS text message to making a financial transaction. To enable an apples-to-apples comparison across the world, we use broadband per capita as a measure of such breadth and complexity (in some ways, mimicking the use of per capita income as a proxy for overall prosperity).
Second, there are differences across countries in terms of how private data is shared across agencies and whether there are digital identity frameworks that can help connect individuals to their digital activities. These institutional factors could make a difference to how data could eventually be pieced together. We do not call out these distinctions. We chose the countries included in our analysis based on a few considerations: 1) Countries that are the most significant contributors to the global digital economy either because they are high on our earlier digital evolution index score or because they have strong momentum in their digital activities; 2) Countries that represent a reasonable spread in terms of region and socio-economic position; and 3) Countries that provided us with a solid data and evidence base to do the analyses.
Finally, an important consideration in determining accessibility is privacy. Privacy concerns and data protection regulations can help or hinder the abilities for algorithms to develop new capabilities. We take the position for this analysis that an established framework for ensuring privacy and data protection and openness to the mobility of data is a net benefit and a positive contributor to the development of AI over the long term. As an example, consider the problem of fraud detection in financial transactions. Applications that draw upon insights from diverse geographic locations and multiple usage contexts help establish patterns of trustworthiness and help flag security risks; such applications benefit from systems that meet the accessibility criterion. That said, we acknowledge that in the near-term there could be some countries – China being the pre-eminent example – where data-sharing between public and private sector agencies with very little mobility beyond the national borders could violate privacy and openness norms and yet yield a temporary advantage in training algorithms inside a “walled garden.”
Which of these criteria should be used in assessing a potential new world order, based on data? We believe accessibility should remain a foundational criterion. If one were to take the point of view that the biggest and highest impact AI applications are the ones that serve the greatest public purpose, access to data is key. In its recent study of AI for the public good, McKinsey cites access as one of the principal barriers: of the 18 bottlenecks identified by McKinsey, six relate to data availability, volume, quality, and usability.
excerpt: Harvard Business Review. Read the entire article here.