It's crucial to remember that data, especially large data sets, has gravity. While WANs are significantly faster than they were 20 years ago, data volumes have grown at an even faster pace. The rate of data growth far outstrips the expansion of network links. Data grows faster than Moore’s Law, and Moore’s Law grows much faster than WAN links. Even if network links could keep pace with Moore’s Law, at some point the CAP theorem becomes a limit to scale. Consistency, Availability, and Partition tolerance becomes really hard to handle at scale and across inter-site data center links. Some very specific applications, typically read-only or in the content distribution niche can handle this; most real applications can’t.
The challenge of moving data to data centers across distances invariably leads to a battle against network limitations. Physics imposes constraints that are difficult to overcome, regardless of what you call your data prefetching technique. While AFS-like global namespaces with old-school opportunistic locking semantics might be suitable for niche applications like data distribution, their potential to become the dominant computing paradigm is implausible.
This doesn’t mean I don't think highly of ideas like replication of data for protection or distributed erasure coding in availability zones for object storage resiliency. These ideas work well and belong in a special category of recovery techniques, but they aren’t exactly what people mean when they say global namespaces.
Content distribution itself is a niche use case, and with the increasing prevalence of dynamically generated content by GPUs, the role of storage in this context diminishes. It is true that GPUs are a scarce commodity and moving data to where a GPU is may be a required Band-Aid for supply chain constraints, but the GPU scarcity will be short-lived enough that it’s unlikely to lead to a meaningful and lasting architectural change.
One of the most intriguing distributed systems with a compelling namespace in recent years is Google's Spanner. Google, having the advantage of building applications from scratch, addressed some of the most challenging problems in distributed storage. They recognized that many queries could be answered using older data. Consequently, they designed their storage system to be queryable at any point in time. This innovative approach allowed applications to determine whether to wait for the most current data or utilize older data to answer queries. While this is a remarkable technique, only a few companies globally possess the resources to build and maintain such a system since the application must be modified to interact with the system; traditional read/write semantics doesn’t work in such a system.
Moore's Law continues to drive advancements in GPUs, the most data-intensive processors available, and as a result, we're only at the beginning of exploring the possibilities of GPU computing in enterprise data centers. As GPUs become the dominant computing form, we will either recompute the data directly, or data gravity will make it so that the compute resource will execute close to where data lives.
Despite industry buzz around global namespaces, history has shown that data gravity, network limitations, and consistency challenges impose real barriers to widespread adoption. While niche applications may benefit, enterprise leaders should remain skeptical of solutions that claim to eliminate fundamental storage constraints. Instead, the real shift to watch is how compute resources move closer to data—whether through GPUs or architectural shifts that prioritize locality. In this era, just like in past eras, placing compute resources in proximity to storage will remain the dominant architecture—betting against data gravity is a fool's errand.