Skip to Content
15:08 Video

Building a Distributed Data Infrastructure with Roblox

With over 1,250 developers building games around the world, Roblox’s Modern Data Experience is all about cloud native innovation.
Click to View Transcript
00:06
Unknown: The metaverse, a collective virtual world where people come to create, play, work and socialize. It's a place of limitless exploration of the imagination, where anything you dream can be built and shared. Roblox provides a platform making it possible for a global community of developers and
00:29
players to converge for a shared experience beyond the physical world. Data is the common thread. It powers the metaverse and propels Roblox users to new creative realms. Pure Storage helps maintain the reliability and performance the Roblox platform demands to keep the metaverse spinning. When your
00:52
mission is bringing the world together through play, controlling your technology's destiny is the wisest move you can make. Wow, such a cool video and company. Thanks so much for joining us here today, Charles. It's really great to have you.
01:14
Thanks, Rob. It's really great to be here. So after seeing that video and all the news and buzz around Roblox this year, I just got to ask what's it like working at the hottest game gaming company around? It's fantastic. The Roblox platform is incredible. To work
01:29
so closely with one of the most dynamic and engaging communities on the planet, while supporting all their creativity, nothing short of amazing. All of our technology comes together to produce one of In my opinion, the coolest metaverse experiences around
01:46
us fantastic. It must be truly rewarding and really just super exciting to be a part of all of that. So you know, it's been a blockbuster year for you guys. You're the hottest gaming platform out there. You're the leader in the metaverse, you've doubled in size in 2020. You've just coming off of a spectacular
02:01
IPO and you're continuing to grow even as we start to return to normal. And you know, actually on a personal note, I have to share I mentioned to my niece that we'd be having you here with us at accelerate. And she was really excited to tell me all about how Roblox has helped her to stay in touch with
02:15
her friends over this last year. So you've also become a beacon of connection and entertainment for almost 200 million people all over the world, helping really to keep us going through all of the hardships over the last year. Yeah, it's been a really wild year. As you know, our mission is to
02:34
bring the world together through play. And it's been great to be a part of delivering that in a year when people needed a place to escape and be a community. Okay, so you know, for those maybe less familiar with Roblox let's just set the stage Roblox, you know, you aren't just a
02:50
single game. You're a gaming platform, right? Yeah, that's correct. Okay. So then that must mean, you know, you've got to customer audiences, then right? You've got the millions of game creators out there who are building the games, whether it's, you know, murder mystery to or my nieces favorite adopts
03:04
me. And these aren't Roblox employees, right? That's correct. Okay. So these are these are adults and kids alike, you know, in their homes, who are using their imagination, your platform to invent and create these games. And then, you know, you've got your other set of customers, the millions
03:19
and millions of players out there who are online at at any given moment playing this games, right? Yeah, that's right. It's two very demanding customers that are quite different. The way I see it, they may have different outcomes. But our expectations for the platform are the same.
03:35
It has to be fast, secure, and always available. So that's got to be challenging, right, you know, got two sets of customers help us understand how those demands translate to infrastructure. How have your team, you know, been you and your team been stretched and challenged to keep up with the
03:51
demands in your platform. We've been stretched and challenged in a lot of different ways. So in 2020, we did really well with reliability. And as you alluded, we brought in tires, years worth of new users on in just one week and 2020. So to say that, that we've
04:08
experienced astronomical growth is a bit of an understatement. Yeah, that's, that's pretty amazing. I mean, you know that that is astronomical growth. And I guess it begs the question, how do you how do you sustain that kind of growth with your infrastructure?
04:22
The way that we do it is by building and truly rock solid infrastructure, just something that can handle anything. And two big reason why having the best technology partners is always critically important. Makes sense? So, you know, maybe going back a few years. You
04:40
know, I first read about Roblox in an interview that your CIO Dan Williams did with TechCrunch. And one of the things that really stood out was, to me was was that he talked about the three main main levers of infrastructure modernization as he saw them. I called out reliability,
04:57
performance and cost. What stood out to me was How closely aligned those three are to our own values here at Pure? Can you maybe share a bit about how these principles have shaped your infrastructure decisions over the years? Yes. So if we have an infrastructure, which is always
05:15
reliable, and always performant, and always costs, an effective amount of money, we're able to provide something to our developers, which is fast and available, as well as pay our developers the most money that we possibly can. Another way that we do it is by having everything reliable and
05:35
performant rabl to design an infrastructure, which appears as a single global platform to all players. And one of the ways that we do this is first everything is bare metal. I think Roblox is really unique in this regard. Not only is it bare metal, but we have data centers and pops all over the world. And
05:56
by running containerization, within everything, we're able to orchestrate containers on bare metal, exactly the same as if we were a cloud. And so one of the things that's been really key to that has been how instrumental Portworx has been an all of t is.
06:15
I see. I see. So in that move from your edge pops to the core data centers, I understand you've replatform, from Windows to Linux, you've talked about deploying containers as part of that journey to gain better flexibility, cost savings, has that shifted containers gone? You know, what additional
06:32
benefits have you found along the way, it's gone relatively well, we did our first service to containers, which is our games several years ago, and ever since then, we've just had an exponential explosion of containerized services within Roblox. So moving to a more
06:50
microservice based architecture, modernizing everything having consistent deployment strategies. Not just that, but through the use of things like Portworx, you've been able to t ke a lot of our traditional m nolithic services, which are o erationally heavy, and make t em a lot more trivial to run. S
07:10
things like databases, and v ry big stateful services, t aditionally, they've been l cked to a handful of large b re metal boxes. But through o r platform today, through our i frastructure, we're able to c ntainerize, all of that and m ke it highly available and j st, you know, like the snap of a
07:31
finger, it's super easy. So then, to net it out containers are helping you to build a faster, more agile software base, giving you access to a wider variety of open source technologies along the path, and then unlocking the flexibility to help you deploy
07:49
those in different under different platforms, whether it's bare metal, or in different places, such as the edge or or in your core data centers. Excellent. So, you know, shifting gears, in addition to building one of the fastest growing online services out there, you know, you mentioned
08:06
before the single globally connected platform, I'm guessing that that adds an additional challenge where you know, now that everybody's log in at the same time, they're interacting with one another, that's got to present additional infrastructure challenges above and beyond what you'd see in
08:20
other services. So you know, how do you design for that. So one of the ways that we design for that is by trying to break out the most difficult to scale services into their own kind of little piece and push those out to the edge. So that way, we have pieces with ways to scale them in very predictable
08:40
ways that can be worked around during failures. So one of the things that our platform is really good at is making sure that players are always playing that people who aren't just playing games, but also want to have experiences there are a lot of things on Roblox that aren't games are always able to do that
08:59
with their friends. And by building kind of a decentralized architecture, we're able to build this shared global platform so that everyone can play and enjoy as much as possible. So that sounds great. But you know, in an environment like
09:14
that, what how do you think about reliability, right? You know, what are the implications? How do you manage for reliability at that type of scale? Today, we have a pretty small team actually for talking about how it's just seven individuals were the SRP team for the
09:29
orchestrators system. And the key is kind of Site Reliability Engineering, right? It's exactly what it says in the title. As Site Reliability engineers were responsible for large portions of the site itself. And if Roblox ever has any issues, our team is frequently page because we live at many layers of the
09:49
stack. One of the keys to our success is driving everything as code. So we drive infrastructure and configuration as code by leveraging all kinds of open source and in house technologies, if there isn't best in breed readily available on the market, we make it ourselves. And we dog food, all
10:07
of our own systems in order to have reactive automation around everything. So, you know, I'm glad you brought up brought up automation this year at accelerate automation and the role of modern business applications is a huge theme. So, you know, can
10:21
you share a little bit about how technology partners such as Pure and Portworx, help SMEs like y urself to achieve that? Yeah, porks helps us by making one of the most challenging topics of past present, and likely future very easy. Rob Cameron, who is one of my co workers really likes to say, you
10:39
have to make it so easy, you can't not do it. And that's how easy Portworx makes d stributed storage. As a person w o's personally ran a number of d stributed storage systems in t e past, nothing even comes c ose. It's not flawless, b cause it's such a challenging p oblem space. However, all the i
10:55
sues that we've had up to this d te have been completely m nageable with our very small t am. Well, that's really inspiring to hear. You know, so maybe, maybe for those members of our audience who might be on a similar journey to either building or enhancing their SRP
11:08
capabilities or, or modernizing through automation, what type of advice could you give? What are maybe some of the biggest lessons learned and where should people get started? It's really about finding the right folks who want to help everywhere, who are great operators who know how to
11:24
develop and great developers who know how to operate. It's about getting those folks to talk to everyone, and implement a lot of structure in places where there isn't a ton. So it's also about setting standards and thinking about things in abstract ways. For instance, everything can be monitored, alerted, roll back,
11:41
roll forward. does everything have a runbook? You can write one, is there a standard RFC process across the organization? Is there a team of folks who are trusted system reviewers? That's something that a lot of SRP teams need to handle? And there's a lot of other things to think about as well. Like, as a
11:59
distributed system. If something is slow, is that actually a problem? can something be too fast? I'd say the biggest lessons learned or the hardest, which is that SRP is going to be totally different for every company and situation. And the second hardest lesson learned is where and when to get started. I
12:17
think the Where is everything you have today with your existing folks, and the when was yesterday. Alright, so bring the dev and the ops closer together, get organized, get repeatable and get started. Yes, while staying available, and fast. Excellent. Alright, so you know, this year
12:35
at accelerate, we're talking superheroes, we're talking breakthroughs. Who's your favorite superhero? My favorite superhero is the Hulk. You could say that we've smashed through a lot of barriers in the last year and came out the other side relatively unscathed.
12:48
So you know, we really appreciate you taking the time to come here and share your story with us today. Charles, maybe one last question for those of us in the metaverse. You know, where can we find you? Where are you hanging out? What's your favorite game?
13:00
My favorite game today is bad business. Nice action action game there. Well, you know, we'll be looking over our shoulders for your avatar. All right. Well, you know, last thing before you go, you know, I'd also like to take this opportunity to announce this year's winner of our cloud
13:16
champion award. And after what we've already learned about this company, it should be no surprise. It's been really inspiring to learn more about your approach to bring the world together through play. So let's take a look. Each day, 10s of millions of players cross the Roblox portal
13:42
to access games generated by over 8 million developers. It's a community bursting at the seams, especially during the pandemic when kids flocked to it to stay connected with friends. Ease of use for developers and safely storing player data hold utmost importance. Using proprietary algorithms, Roblox
14:03
matches players with Best Game Instance using its distributed Compute Cloud collection of worldwide servers to optimize the experience. at peak it has hosted millions of concurrent players. Managing such high volume traffic demands a bulletproof strategy with rock solid storage as a top priority.
14:23
The Roblox Pure Storage relationship is a perfect match. With new people constantly exploring millions of immersive 3d experiences. Roblox brilliantly carries out its mission of bringing the world together through play. Congratulations, Roblox. We're really proud to award you the
14:48
cloud champion breakthrough award. It's been an absolute pleasure to support your success, and thanks for being here with us today. You're welcome. And it's, I hope all of you have a fantastic couple of days at here at accelerate. Thanks for tuning
15:01
in.
  • Portworx
  • Containers
  • Gaming

It takes a modern data infrastructure to support an entire metaverse. In order to handle hundreds of millions of MAUs and maintain the astronomical growth of their player base, Roblox needs to be fast, secure, and always available. In this presentation featuring Rob Lee (Chief Architect, Pure Storage) and Charles Zaffery (Sr. SRE Manager, Orchestration - Roblox), find out how Roblox has built the bedrock of their containerized, bare-metal data infrastructure using Portworx, and how they maintain their entire distributed data infrastructure with a team of seven SREs.

12/2024
Portworx on Red Hat OpenShift Bare Metal Reference Architecture
A validated architecture and design model to deploy Portworx® on Red Hat OpenShift running on bare metal hosts for use with OpenShift Virtualization.
Reference Architecture
33 pages
Continue Watching
We hope you found this preview valuable. To continue watching this video please provide your information below.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.