Skip to Content

What Is Amazon FSx for Lustre?

Amazon FSx for Lustre is a fully managed, high-performance file system for compute-intensive workloads that provides fast processing, scalability, and cost-efficiency.

The “FSx” refers to “fully managed file system” and Amazon currently offers FSx services for several widely used file systems, including the open source Lustre file system.

What Is the Lustre File System?

With a name coined from the combination of “Linux” and “cluster,” Lustre is a file system that is parallel and distributed. It’s most commonly used for cluster computing on a very large scale. In fact, Lustre has been the file system of choice for at least five of the world’s top 10 fastest supercomputers—including the number one supercomputer Frontier, as of November 2022.

Lustre has been a popular choice of supercomputers, massive data centers, simulators, and other high-performance computing organisations because of its extreme scalability and ability to operate many clusters with tens of thousands of nodes, dozens of petabytes of storage across hundreds of servers, and an average throughput of more than a terabyte per second (TB/s).

How Is FSx for Lustre Used?

Because it’s a fully managed service, Amazon FSx for Lustre simplifies your organisation’s Lustre system operation and management. The service helps you avoid the need to set up, configure, and manage Lustre yourself—it’s no hassle to get the high-performance file system you need in just minutes. And with multiple deployment options, you can choose the model that is most cost-effective for your needs.

What Are the Differences Between EFS, EBS, and FSx?

In addition to FSx, AWS offers a range of data storage options, including Elastic File System (EFS) and Elastic Block Store (EBS). It can sometimes be a bit confusing for organisations to understand the differences between these offerings and which options can best serve a company’s unique needs.

The short answer is that AWS provides options for different types of storage, which are file, block, and object storage. Each of these storage types is simply a different way to store data. Before comparing EFS and EBS with FSx, let’s take a closer look at EFS and EBS individually.

Elastic File System (EFS)

EFS is a file storage system, which means data is saved in hierarchies (much like the directory, folder, and file system storage of most PCs). Highly scalable and fully managed, EFS can be attached to EC2 instances with Mac or Linux operating systems as well as compute resources in on-premises data centers. The storage can expand to petabytes of capacity and offer low latency across thousands of instances. Thanks to its low latency and scalability, many organisations use EFS to move on-premises applications and workloads directly to the cloud.

Pros of EFS include centralized file storage that is affordable, scalable, and easily accessible. Its shared storage is compatible with the cloud and easy to integrate without having to go deep into re-coding.

Cons of EFS include the fact that it doesn’t work with Windows, and file storage simply can’t offer the high performance of block storage in regards to input/output operations per second (IOPS). File storage can also be difficult to manage once data volumes get large enough, and users have to know the path to a specific file to be able to find it.

Elastic Block Store (EBS)

This is AWS’s block storage option. Block storage is known for being fast and stable—mostly because it doesn’t include metadata and because blocks can be stored in the most efficient locations, regardless of operating system, or even distributed among multiple servers. EBS storage is attached to instances of Amazon Elastic Compute Cloud (EC2), especially those that are mostly transactions and need to scale easily. For instance, some organisations use EBS to store unstructured NoSQL databases or relational databases that are self-managed.

Pros of EBS include its speed, flexibility, and reliability. That makes it ideal for transaction-heavy use cases that require low latency. And because you can update block storage by overwriting individual blocks (and not an entire object, as in object storage), updates and changes are fast and efficient.

Cons of EBS include the lack of metadata, which makes it fast to store but slower to search. And EBS storage can be attached to just one server at a time (although there is an EBS multi-attach capability in some situations).

FSx for Lustre vs. EFS and EBS

Amazon FSx for Lustre offers ultra-high performance. It’s a file storage system, similar to EFS, but one big difference is that FSx can operate on Windows. FSx performance surpasses EFS and EBS performance when it comes to heavy-duty workloads such as AI and machine learning, massive data analytics projects, video processing and digital effects, financial analytics, and more.

FSx offers a lower TCO than EFS and EBS and gives organisations flexible data processing options for both short- and long-term storage. FSx also is able to determine how fast the file server hosting the system can serve up file data and offer higher levels of throughput than EFS and EBS—higher levels of IOPS and more memory for caching.

How Amazon FSx for Lustre Works

With Amazon FSx for Lustre, organisations can easily access their Lustre file systems. These systems can scale as needed across multiple servers and storage disks. Because of that scalability, FSx can eliminate many of the traditional bottlenecks users find in other file systems.

An Amazon FSx for Lustre file system is composed of a centralized file server and a number of attached storage disks that hold the data. Clients communicate with the file server, which can optimise performance for the data that is accessed most regularly using a speedy, in-memory cache. Part of what makes it so fast is that when a client wants to access information stored in the in-memory cache or the SSD, the server doesn’t need to read it from the disk. Latency is therefore lower and throughput is higher.

FSx also offers two storage options based on the choice of using a solid-state drive (SSD) or hard-disk drive (HDD). Which option is best depends on an organisation’s needs:

  • SSD storage is ideal for workloads sensitive to latency or those workloads that require the highest throughput or IOPS.
  • HDD storage is ideal for workloads that require high throughput but aren’t highly dependent on ultra-low latency.

To set up Amazon FSx for Lustre:

  1. Using the AWS Management Console, create your file system. You can also use a command line interface (CLI) or software development kit (SDK). Here is where you can also designate which deployment option you prefer: scratch or persistent (see below).
  2. If you’re using Amazon S3 storage, you’ll then link your newly created file system to your S3 bucket so you can process and access any data sets stored in S3.

  3. Use any Linux client—including EC2, EKS, or on-premises clients—to access your file system.

  4. Now you can run your applications, from machine learning to high-performance computing to media rendering and more, with shared file storage that offers the high performance you need.

Differences Between Scratch and Persistent Mode in FSx for Lustre

Organisations can choose from two deployment options when they use Amazon FSx for Lustre: scratch and persistent. Which one to choose depends on how long you need to store data.

Scratch file systems are meant to be used for more short-term data processing and temporary data storage. The system does not replicate scratch data, which means it can be lost if a file server malfunctions. The advantage of scratch file systems is that they provide excellent throughput—a big burst that can equal up to six times the standard baseline of 200 MBps per TiB (equivalent to just over a TB) of storage capacity.

Best use cases for scratch file systems include cost-effective storage for workloads that are heavy on processing and only needed for a short period of time.

Persistent file systems are meant to be used for workloads that need to be stored for a longer period of time. This storage type, if highly available and stored, is replicated automatically in the AWS Availability Zone where the file system is located. The advantage here is that if a server fails, stored data is replaced in just minutes.

Common use cases for persistent file systems include persistent storage for containers, data lakes stored in S3, high-performance computing that needs longer-term storage, throughput-focused workloads that need to run indefinitely, and workloads that are sensitive to disruptions in availability.

Benefits of Amazon FSx for Lustre

Benefits of using Amazon FSx for Lustre include:

  • Flexible high performance: You get speedy performance that is also scalable, consistent, and predictable.
  • High availability and durability: Deployment options allow you to ensure the right level of availability and data durability.
  • Ease of use: The fully managed service makes it simple to spin up a file system in minutes with no need to worry about backups, updates, or consumption.
  • Cost-effectiveness: FSx offers many storage options and choices that allow you to find the right balance between cost and performance; plus, there are no setup charges or minimum fees and you’re responsible for paying only for the resources you use.
  • Security and compliance: Encryption for data in transit and at rest is automatic, and you can also control network access as desired.
  • Simple integration with AWS services: You can use FSx with all of your other AWS services without hassle.

High-performance persistent storage for file systems

While FSx and EFS are viable persistent storage options for file systems based on Amazon Elastic Kubernetes Service, Pure’s Portworx offers key advantages over both. Portworx empowers you to run any cloud-native data service, in any cloud, using any Kubernetes platform, with built-in high availability, data protection, data security, and hybrid-cloud mobility. Thanks to all of the above, Portwox offers significant advantages in:

  1. Performance
  2. Cost
  3. Disaster recovery

Experience the simplicity and performance that comes with using cloud-native persistent data storage for your Kubernetes workloads. Learn more about Portworx here.

06/2024
The Pure Data Storage Platform for AI
Pure Storage® accelerates and simplifies AI deployments, enhancing their value to the enterprise.
White Paper
14 pages

Browse key resources and events

PURE//ACCELERATE® 2024
Experience Pure//Accelerate

Get inspired, learn from innovators, and level up your skills for data success.

See What’s Happening
PURE//ACCELERATE ROADSHOWS
An Event Is Coming Near You

Join us for a Pure//Accelerate event and discover storage solutions for the next generation and beyond.

Register Now
RESOURCE
The Future of Storage: New Principles for the AI Age

Learn how new challenges like AI are transforming data storage needs, requiring new thinking and a modern approach to succeed.

Get the Ebook
RESOURCE
Stop Buying Storage, Embrace Platforms Instead

Explore the needs, components, and selection process for enterprise storage platforms.

Read the Report
CONTACT US
Meet with an Expert

Let’s talk. Book a 1:1 meeting with one of our experts to discuss your specific needs.

Questions, Comments?

Have a question or comment about Pure products or certifications?  We’re here to help.

Schedule a Demo

Schedule a live demo and see for yourself how Pure can help transform your data into powerful outcomes. 

Call Sales: +44 8002088116

Mediapr@purestorage.com

 

Pure Storage, Inc.

2555 Augustine Dr.

Santa Clara, CA 95054

800-379-7873 (general info)

info@purestorage.com

CLOSE
Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.