But, because you’re now writing new versions of data into different flash pages, eventually you accumulate data in those blocks that could be considered “garbage” because the data has either been overwritten or logically deleted.
How Garbage Collection Works in SSDs
To reclaim this physical capacity, a “garbage collector” process in the drive firmware takes the data that is still valid and moves it to a new location so that it can then erase the entire block containing the “tombstoned” data. For this garbage collector to work, each drive needs extra flash memory, what’s known as “overprovisioned space,” and every garbage collection event consumes one of the finite number of flash program/erase cycles. The amount of physical writes to the drive that every logical write consumes is known as “write amplification.”
Overprovisioning and write amplification lead to premature wear and shortened life span of the SSD. There are also performance impacts from this design because every time one of these flash dies is doing garbage collection, reads or writes won’t be available from that die. Therefore, performance of the SSD fluctuates unpredictably as the garbage collector becomes more or less active.
What makes this even more challenging is that SSDs have no way to communicate this garbage collection activity to the system that’s accessing it. Rather, the SSD has to maintain the illusion that it’s just like a hard drive. As the number of bits per cell in NAND flash increases, these performance inconsistencies only get worse, as program/erase cycles take longer and longer, leading to longer periods of data inaccessibility.
The Benefits of Using DirectFlash
DirectFlash takes a different approach to flash media management. Rather than deputizing every SSD to perform its own wear leveling, garbage collection, and overprovisioning, the Purity operating system performs these functions in software at the array level. This means each DirectFlash Module is simpler than a traditional solid-state disk, as it only has to provide access to media itself and handle low-level data and signaling tasks.
Learn more about how DirectFlash is bringing an end to hard disk drives (HDDs).
The benefits that this provides are numerous:
- Improved Density and Efficiency. Our DirectFlash Modules (DFMs) deliver a storage density two to three times better and consume from 39% to 54% fewer watts per terabyte than our closest competitors today. Pure Storage DFMs do not emulate mechanical HDDs, allowing silicon-based flash media to be optimally managed in a way that significantly improves performance, storage density, effective capacity, media endurance, and cost per usable TB relative to COTS SSDs. Pure Storage is shipping 48TB DFMs today, is adding 75TB DFMs later this year, will be adding 150TB DFMs within 18 months, and is planning for 300TB DFMs by 2026. Learn more.
- Smart Data Placement. Instead of each SSD making decisions about data placement and media management in a vacuum, Purity knows about all ongoing and scheduled system tasks such as current IO activity, data reduction operations, pending garbage collection cycles, and overall array workload and health. This allows Purity to make much smarter placement and scheduling decisions than a single drive could do on its own.
- By making smarter data placement decisions, data of similar expected life spans can be co-located on the same blocks to minimize instances where some data in blocks is “tombstoned,” while other pages are still valid. Purity knows if certain pages are all part of the same file or object or coming from the same host system, and so by grouping those pages together into similar blocks when that file or object is deleted, the entire block can be freed at once—without rewriting other live data and causing write amplification.
- They Outperform and Outlast. By performing no garbage collection and causing no write amplification, DirectFlash Modules outperform and outlast their commodity counterparts. Fewer writes means less wear and thus longer drive life spans. Fewer writes also means more IO cycles are available to service “real” client IO. And because Purity knows about current IO activity and has visibility into the entire system, it’s never surprised by one of these program/erase cycles blocking access to data. In the worst case, Purity can just reconstruct that data from parity rather than waiting for a program/erase cycle to finish. This significantly reduces the worst-case latency of our systems, even when using QLC flash.
- They Improve Over Time. Because we perform all these media management tasks in software, we can improve this software over time. All Pure Storage systems connected to the internet securely phone home telemetry data, and since we have deep insight into the health and activity of the underlying flash memory, we aggregate and analyze this data to improve how our software works in the real world. This means over time, our systems’ reliability and performance can improve with regular software updates.
- They’re Simpler and More Reliable. Because we perform all these activities at the array level in software, our DirectFlash Modules don’t need complex controllers and large amounts of RAM to do all this work on their own. Thus, our modules are simpler and therefore more reliable, in addition to being more efficient. We can also scale the size of our drives with advances in NAND flash fabrication technology, without needing to increase drive complexity or cost.
What this means for customers is systems that have more performance, more consistently, and more reliability and longevity than other all-flash or hybrid systems designed around SSDs.
Pure Storage was founded around the belief that the future of the data center was all flash—and we’ve built our DirectFlash technology around making this vision a reality. We believe the best way to build all-flash systems is to build the system from the ground up for all-flash. That means eliminating the parts of the system designed around legacy interfaces and paradigms and letting the technology truly shine.
Want to take advantage of DirectFlash technology in your data center? Check out our suite of all-flash storage solutions today.