55:34 Webinar

What DBAs Need to Know About Snapshots

In this session, Brent Ozar will explain how he came to love storage snapshots. He'll discuss how they work, why they're safe to rely on, and how to tell when they're a good fit for your workloads.

This webinar first aired on 12 May 2022

The first 5 minute(s) of our recorded Webinars are open; however, if you are enjoying them, we’ll ask for a little information to finish watching.

00:00

In this session, I'm going to talk about what database administrators need to know about snapshots. Long time ago, in a galaxy far far away, I was a database administrator and my sand administrator left and I had to step into his shoes. I had to take over the sands and figure out why. Sequel server was slow and my backups weren't working well enough.

00:21

And my sand vendor started talking to me about sand snapshots about things for my data warehouse that would make things run faster. When they started talking to me about it, I simply didn't believe it. It sounded utterly insane. It didn't seem like it would work. Well, frankly, when this is going down 10, 15 years ago,

00:40

there were cases where it didn't work very well. But the idea behind this was that. What if every backup took, like, 10 seconds in order to run Max? What if every restore took like, 10 seconds to run? Max? What if I could magically take one of these 12th backups and immediately present it over to

01:02

another sequel server as a database administrator? A classic pain point for me was someone wanted to refresh the development, or QA environments, and they wanted to be able to reproduce the problem that we were having in production right now. They wanted as close to up as up to date as they possibly could from the current production databases.

01:25

I also had just to make sure it doesn't show my notifications on my desktop Here we set myself and do not disturb mode because it looks like I just got a couple of, uh, notifications pop up on their God only knows which all are going to see with, given what my browser history has. Dmitry says hello from Greece, and a bunch of other people say hello from different places.

01:45

Florida. So this is a common problem that I ran into where we had to take a quick backup in production and then presented over the development. When you have multi terabyte databases, there is no such thing as a quick backup backup suck restore suck checked BB sucks and then heaven forbid someone come running in and say,

02:10

Hey, Brent, we accidentally dropped a table. Could you do me a favour and just restore this one table? It's amazing to me. At the year, 2022 that we don't have a built in a way inside a sequel server of pulling just one table quickly from backups. All of this is the idea behind sand snapshots. And in order to teach you basically what sand

02:32

snapshots do, I'm gonna have to step back a little bit and talk about some underlying concepts. Inside of shared storage is a database administrator. You should probably never have to learn this stuff, But in order to really believe how cool sand snapshots are, you're gonna have to learn some of the basics.

02:49

Let's say that we've got a table full of bloggers from the sequel server community, and it's going to be a really large table. Let's just say that I'm only showing you a small percentage of the rose here. My table consists of the bloggers, last name and the time that they last blog. Let's say that I'm constantly keeping track,

03:15

and no, Santino says, Why am I not on this list? Well, you don't blog often enough. I track your blog updates and date format. No, I just could. So if I'm going to think about where that lives on disc Historically, when I was your age, we would divide our data or stripe our data across several drives. Back when I was your age, these were hard

03:41

drives and they were magnetic spinning rusty Frisbees, but that those days have gone. If you're deploying sequel server today on any kind of meaningful production server, it should be on flash storage because it's so inexpensive relative to the cost overhead of Magnetics and how slow they perform. But Sequel server would tell the operating system I want you to lay this down on storage

04:04

and sequel server didn't have any idea of what physical drives these things lived on, so what Windows would do and a raid controller would do is they would take a redundant array of independent drives or, some people said redundant array of inexpensive drives. That's what the R A. I d stood for. Yes, and Windows or Lennox or whatever operating system you chose would stripe your

04:31

data across all of these drives. This is how we got performance because anyone drive wasn't really fast enough to handle our workloads. This is especially true when we start to talk about multi terabyte databases when some yo yo wants to go do a table scan across a 10 terabyte table. It's going to take a really long time in order to do that tender by table scam.

04:56

So sequel server, though as far as it's concerned, it's just one big table, and it doesn't need to have any advanced knowledge Now. You would never do what I had over on that last slide where I stored all those bloggers across individual hard drives. But I didn't have backups. As you can see there under a Mrs Adams only had her blog on one specific drive.

05:19

Well, that would be suicide and a sequel server production environments. You need your data stored across multiple drives, so we would store them across multiple copies. We would have one copy across safe four drives and then another copy on four drives. That's true. And there are other kinds of raid levels,

05:37

like Raid five and raid six. I'm not going to talk about those today. The idea here is just that your data is spread across several drives. So through the rest of this presentation, even though I only show one copy of your data, I want you to understand that there are multiple copies of your data underneath the hood. I just only have so much space that I can use

05:56

across a slide deck. And I want to use that for more valuable, meaningful things. Yeah, well, a long time ago, 5, 10, 15, 20 years ago, the way that we ran storage was that we would carve out these four drives for the sequel server. These four drives for the email server,

06:16

these four drives for the file server. But that was stupid and inefficient because we couldn't really predict how much space each server needed. We couldn't really predict how much performance people needed. And of course, if you asked a database administrator, they'd say things like,

06:36

I want the fastest drives that you have and I want as many of them as possible. And yet when you went to go look at performance most of the time, you weren't actually using all of that throughput. Your storage. We look kind of bored sitting along most of the day. Why on earth are my notifications still popping

06:55

up? Let me go kill safari all together just to make sure that that doesn't happen. You would look at performance and all day long, and hardly anything would be happening all day long. You wouldn't be pushing storage that much, and then all of a sudden your backups would take off or your check DB would take off and

07:13

your workloads would go through the roof. To make matters worse, most of us would schedule our backups or schedule our storage, corruption, checking to happen every night, all around exactly the same time. So you would run your backups on the hour like 10 PM or 11 PM You would run your check db and your index rebuilds exactly on the hour, and your storage would all get overwhelmed at the

07:38

same time because everybody was doing backups around the same time. So this method was wildly inefficient. And what storage vendors ended up doing was they ended up having pools of storage, and they would stripe your data across lots of drives. So as far as my sequel server, and as far as Windows was concerned,

08:01

my data was going to one volume. But the storage device was magically mapping where my data actually lived. They could even move that data from one set of drives to another in real time without my sequel server going down because they're just maintaining this list of where all of my data lives. What's awesome about this is that now,

08:26

if you do a select star across the bloggers table and you ripped through that five terabyte table or whatever it is now you have all of the drivers working for you in concert. Yes, you are also competing with everyone else's requests. But as long as you provision the storage well enough and give yourself some flexibility on your bursty workloads, this makes everything I like to think of it as unpredictably faster

08:54

now, as far as sequel server is concerned, sequel server just still still still sees it as one table, but the storage is the device that's actually tracking where all of this data lives. Your storage devices track, which bloggers live on which drives the storage, is responsible for protecting this and making sure that it lives on one particular drive at any given point in time.

09:19

Mhm. This led to all kinds of interesting advances around virtual raid around storage, tearing because in the transition of the last 5, 10, 15 years, flash used to be really expensive. Now Of course, your laptop has flash. Your phone has flash. Your tablets have flash. Flash is progressively more commonplace,

09:42

but there's also different speeds of flash. You have different kinds of NAND flash. They respond in different speeds or respond differently to rights than they do over long saturated rights as opposed to small bursty rights. So the storage vendors were doing all this mapping to go figure out which hot data lived on hot drives As a database administrator, this is so awesome for me because what we used to

10:08

try to do manually was data partitioning those of you who have been around for five or 10 years. You might remember that over the years you tried to do things like partitioning in a data warehouse, and the idea was that you were going to put the new data on hot storage data warehouse every night. What do you do? You load last night sales, and then you query

10:31

the bejesus out of it. You're just constantly querying that because everybody wants to know how last month sales went or this month sales went. But the data from like, six months ago or 18 months ago, people are less concerned with so you could drift that down to slower storage. But it didn't really work at all because with partitioning sure,

10:54

you can separate stuff on two different storage, but you can't actively move it around from one part of storage to another. You can't say now, Okay, this month is over. Let's move May's sales down to slower storage. That would be a big time intensive, logged operation. Plus, you don't have enough hours in the day to write scripts and tools that would manage that

11:17

kind of thing for you. The storage does the storage can track which blocks you're writing like Crazy, which blocks you're reading like crazy and could automatically promote stuff up to faster storage because the shared storage was responsible for tracking the map. Now a melody. That's funny.

11:47

Um, so the shared storage was responsible for tracking the map of where all this data lived. And from sequel service perspective, there's just absolutely nothing to worry about. That's a foundational concept, and then the next thing that builds on top of that is storage snapshots. This is where it starts to become meaningful for you and I as database administrators,

12:10

because with storage, tearing and storage with automated raid and adaptive rate. We didn't really have anything to do with that. That just happened manually or happened automatically from the storages perspective. But come back to our blogger table. And now, on our blogger table, the sand is keeping this map of where all our storage lives. The sand is keeping track of which bloggers live on which physical drives.

12:39

Let's say that Aaron Bertrand goes and publishes a blog post. So right now, Erin Bertrand's data lives on drive to spot A. If Aaron decides to publish an update to his blog post, he writes, a new brand new blog, Post publishes an update to his existing one. We're going to run an update command and sequel server and sequel server has to update the page

13:05

where Erin's data lives. But from the storages perspective, there are a couple of different ways that we could tackle. This one way is that we could update his data exactly where it lives today on Dr Two on Spot A. We could simply overwrite his one a.m. blog post with a nine a.m. blog post instead,

13:29

but another way that we could do it is the storage could write the data to a new place, leaving errands old blog post still in place. And as long as the storage kept track of errands history throughout the entire contents of time, it could know. And let us rewind back to an earlier map,

13:57

it could say, just before Aaron published his blog post, I want you to rewind back to that exact point in time and show me the data as if we were querying the data at that moment in time. Now this doesn't require a little bit of an overhead and one of the overheads of space. The more that you're writing, the more space you're going to have to keep around in terms of

14:22

history. Database administrators. For a long time, we've been so paranoid, focused on space. I got to shrink my files. I don't have enough space, You know, I got to minimise my log files. But when you think about it holistically across the entire environment,

14:38

there's not quite as much space pressure because also, storage has gotten a whole lot cheaper. It's funny. I still know people who think of 100 gigabytes as big data, and then they turn around and look at their phone and they've got half a terabyte worth a ramp or half a terabyte worth of flash storage just on their phone alone.

14:58

Modern file systems, even on laptops like I run a Mac and I have a journaling file system where the Mac is constantly keeping track of different versions of files for me. And that's what the sand is doing by keeping all these history what the sand can do or what the stand gear can do. Say, I say, sand a lot. Which sand stands for? Storage area networking.

15:21

Technically, that's the network of fabrics that connect between the sequel servers and its storage. They tend to say sand gear, just meaning the kinds of gear that actually holds the data. Sand snapshots are a copy of the map. They're not a copy of the data.

15:40

Sequels over backups take a long time because, of course, sequel server has to copy all of the data. Sequel Server reads every eight K page and rights every eight K page off to another location. That's not what a sand snapshot is. A sand snapshot is only a copy of the map as in, Here's where your data lives at this exact moment in time,

16:05

and every time that we go to a right, we're just writing to a new location. Anyway, sand clones are much more time intensive. If you wanted to copy all of the underlying data, that would suck. Yeah, but under most scenarios, you shouldn't have to do that. People worry about Oh, my gosh, what if everybody's fighting over the same storage?

16:30

Well, if I take a snapshot of production and then I show it to development, am I really going to put that much workload on development If I'm doing our own production? If I'm doing rights over in development, remember, every right goes to a new place anyway. So if we're worried about testing a big deployment script,

16:50

we're going to be writing to new areas of storage. We're not really under that much contention with the production drives to begin with. When I take a snapshot, I don't even have to copy the whole map. When you start to think about performance tuning, you don't have to copy the whole map. You only have to keep copies of whatever stuff starts to change,

17:11

and in most cases we're not really changing that much on our sequel servers. We think that we have a lot of activity from a day to day basis, But think about a data warehouse with five years worth of history. If you have five years worth of history, how much of that really changes every day or every week? Hardly any of it.

17:32

You're just loading in new data as it comes in. And maybe in some theoretical world, you're actually deleting the oldest data as it goes past a five year rolling history mark. Although these days everybody seems to want to keep all data around absolutely forever because I think they're going to learn something valuable from it someday, whatever, it just makes the database larger.

17:56

Now that does give me a takeaway that I need is a database administrator. If I think about the rate of change, I don't want to work against the shared storage. I want to work with it. And what I mean by against it is I bet you create a very high rate of change because you think that you need to

18:21

rebuild indexes very frequently. You think that rebuilding indexes is going to improve performance well, as you move towards storage that focuses on snapshots, you want to minimise the change rate. Unless you can prove that your changes are actually making everything better.

18:45

What Index rebuilds, do they make a copy of the entire index? And that's wasting time. When we're talking about a data warehouse with five years worth of history, why would I make a whole new copy of the five years and throw away the old five years when essentially, it's the same data? And I'm only getting extremely small performance improvements?

19:06

And the cost to that is I'm doing a big, huge change rape. I don't want to rebuild indexes every night. When you start to talk about a multi terabyte database, you can't really do that. Anyway. In a multi terabyte database environment, you're usually time constrained by things like data loads that are running overnight. Every night you want to shrink your maintenance window down.

19:29

One of the things that you do as part of that as you move your index rebuilds to every weekend or in a lot of data warehouses closer towards, say every month. Then when it's time to do restore. All we have to do is switch maps out. Normally, when we think about doing a sequel server database restore, you're like Oh God, no, please, anything but that because you have to read the whole entire

19:55

backup file out, usually across the network and sequel server has to go right out the entire log file. We think about the performance advantage of instant file initialization, where sequel server can instantly grow data files. Okay, that's cool, but it doesn't work for log files. And when I'm dealing with multi terabyte databases, often they'll have a log file of 100

20:19

500 gigabytes in size. Windows has to write out zeros, so to speak across that entire log file before the restore can begin. That's additionally, even more performance overhead. That I just don't want to have to hassle with. The magic of doing sand restores is that I can simply take away the old map,

20:42

show the older map and say, Here's what it looked like as of eight a.m. This morning. I don't have to do a bunch of time intensive rights. In reality, when you go to do a database restore with snapshots That's not really what you want to do anyway, because what most people come to you for, restores for Oh,

21:02

my God. I dropped the table. Oh, my God. I ran a delete without a where clause and what you really need to do is just pluck specific objects out of the back up again, where sand snapshots just totally save your day. What you end up doing is just attaching that snapshot as another set of databases on another volume.

21:23

And then you can go into surgical strikes to just pluck out the specific tables that you need and the specific rose that you need. This is so much easier than having to deal with native backups because with native backups, you actually have to provision the space to go to do the restore. You gotta go to your sand administrators and say, Hey,

21:43

look, I need five terabytes worth of space. I got to do a database restore just in order to pluck a few rows out for some nincompoop who, you know, delete did a delete without a where clause that adds latency time to you doing this whole entire operation. Whereas if it's all just sand snapshots. You just attach the snapshot. You don't need extra space,

22:02

and off you go to rock and roll, Gerald says. With the advent of column store tables, the index need for indexes vastly reduces. So why bother rebuilding? Gerald? Sounds like someone hasn't been to my fundamentals of column store class because column store index maintenance is even more important than roast or maintenance.

22:22

Brent Ozark Calm. Tell your friends so this is way better to than sequel servers. Disaster recovery like that subtle plug there. Gerald. This is way better than sequel servers built in disaster recovery, which also restores relies effectively on restores transaction.

22:42

Log restores via log shipping database mirroring always on availability groups. All these involved replaying the transaction log across the wire. Well, sorry popped up replaying the transaction log across the wire that works fine. I love log shipping. I love availability groups.

23:05

I tolerate database memory, but the problem with all of those is when it comes time to fail back. Usually when I fail over, there was something like an asynchronous database, mirroring or asynchronous availability groups or log shipping, where I wasn't able to take a tale of the log backup. And when I want to fail back, coming back in the other direction on a five terabyte database,

23:31

buckle up. What you have to do is do a full backup over on the other side and copy that full back across, back up across to recede, your database mirroring log shipping always on availability groups. It takes so much time it takes so much. Network throughput. Sand snapshots and sand replication are so much better because they can just look at the

23:55

changed blocks going across either side and only replicate the specific blocks that have changed something that sequel server isn't able to do. But wait. There's more with things like log shipping and always on availability groups and database Well, not database mirroring but always on availability groups and Log Shipping. Sequel Server will re copy the backup for every

24:19

replica. If you have, say, five always on availability. Group Replicas sequel servers built in default stuff for automatic seeding seeds it all from the primary. So it's copying stuff across the network wire five times wildly inefficient compared to things like sand replication. Oh, like any technology, there's some fine print on things that don't work quite as well

24:45

as you might enjoy. There are two ways of doing sand snapshots. One is for the sand gear to take a copy of the map without telling the application. When this happens, the application could be in the middle of doing something could be in the middle of rebuilding. An index could be in the middle of starting up doing crash recovery,

25:08

committing a transaction, whatever. If the sand gear just simply takes a copy of that snapshot or takes a copy of the map without telling the application, you can run into problems where, when you go to attach that snapshot sequel server looks at the storage and goes Sweet potato. This smells bad.

25:28

My data files don't match my log files. This is corrupt. I'm out of here. And you saw this a lot with early implementations of crash consistent snapshots from a lot of vendors. Where if you didn't tell sequel server to hold up on what it was doing, then sometimes when you went to restore that snapshot, it would come back as corrupt.

25:50

That was where application or where snapshots came in and worked with Windows V. S s provider, which told Windows. The Sands, about to take a snapshot, stop writing so that Windows told the applications like Sequel, server and then sequel server would stop writing single threaded one database at a time so it would freeze one database than another database than another

26:18

database than another database. And if you had, say, 50 or 300 databases or more could take a really long time in order to freeze all of these databases, and you would see messages in your event log about the number of seconds that it was taking in order to freeze those databases and then again to thaw those databases to allow rights to happen back in.

26:43

So this led to Microsoft having a knowledge based article saying that if you have V. S s snapshots, you don't really want to freeze more than, say, 30 databases at a time, or else you'll end up with long delays when you go to do snaps and thought that you also didn't get to pick which databases on a volume got snap shotted with Windows V S s aware snapshots or application aware snapshots.

27:07

All of the data bases on a given volume got snapped at the same time, so everybody had to get frozen and thawed in the same order every time. This wasn't very efficient for shops that had hundreds or thousands of databases on exactly the same volumes. This is also problematic because sometimes discussions weren't very good in between.

27:30

Teams like the Sand Team and the database administration team weren't always on good speaking terms. So the database administration team would point the finger at the sand team, and everybody would point the finger at the network team when it came time for troubleshooting. There's one other weakness around snapshots that you need to be aware of.

27:48

Sometimes some vendors and some sand admins will configure snapshots in a way that they can still be deleted, or that they'll expire after a certain period of time. In the day and age of ransomware, this is kind of dangerous. You want the kinds of snapshots where some yo yo in the sales department decides to open an Excel spreadsheet from the C I. O. That has their quarterly bonus in it,

28:20

and it takes over their computer. And of course, it turns out that that person has this admin rights on your sequel server because everybody in your company has this admin rights on your sequel server. And next thing you know, your databases are trashed and your backups are trashed. You want snapshots that can't be trashed, things that are effectively right once and read

28:42

many, but you can't right over them. That's something that you just want to be aware of during the purchasing process and the implementation process, so that you don't get hit by ransomware constantly just shocked by how many of my own clients have been hit by ransomware type stuff. So to sum up where storage snapshots make sense is where your databases are starting to

29:04

approach a terabyte in size and you're starting to ask questions around. How am I going to do backups faster? How am I going to do? Restores faster? How am I going to do check the be a reasonable point in time? So often as I deal with customers who data whose databases have exceeded one terabyte in size, they simply stop doing check db.

29:25

They say we're only going to do it, say once a month or less frequently, or heaven forbid, not do check db at all, um, other things to think about is when you get back, ask your sand admins and sys admins. What make and model of storage that you have and can it do snapshots? Sometimes I work with clients who they're they're sand Team just knows peripherally that there is a snapshot feature.

29:50

They just haven't ever done anything with it because they haven't had anyone from the debate team go and ask about the capability to do snapshots. This is so much better than native backups and restores when you hit the one terabyte mark and beyond. Now that's my spiel about what I've talked about about what DBS need to know about the snapshot industry in general.

30:15

Now I'm going to hand it over to Anthony. No, Santino from Pure who's going to talk about what you need to know from pure storage is perspective. I believe I got to stop sharing my desktop here so that I can let him take over there, and several of you have asked questions to We'll get to those at the end of the session, the ones that Anthony hasn't answered over there in tax.

30:34

I've been feverishly typing. My fingers are tired. Brent all right, Let's get cooking I and share my screen. This will work. Yeah. Technology. I am recovering from a laptop crash. I'm on the backup laptop today. Team. Awesome.

30:58

So looking good there, Brent. There we go. Yep, yep. Got the slide deck full screen. Cool. Cool. Thanks, Team. So Hey, I'm Antony Santino. Uh, we're gonna talk a little bit about snapshots for sequel server on pure storage and specifically on flash arrays we're going to

31:13

focus on today. It's all the capabilities and kind of underlying theory that just kind of walked all through applies inside of flash array. Right? One of the cool things about Flash array is it's an all flash array. So the concept of, like volume groups or tiered storage kind of goes out the window in that the data that you

31:30

store on flash Array will get consistent performance out of that device, regardless of where it lives in the device. Because it's all uniform, all flash array. So I have the luxury of talking to all about snapshots, and we're going to focus on the types of snapshots that flash arrays supports implementing again the theory that branches covered in the first part of the sessions.

31:52

We're gonna talk about crash, consistent snapshots and application, consistent snapshots, implemented DSS or volume shadow copy snapshots are going to talk about the capabilities and use cases for both of those right now, specifically inside of pure storage environment. So if you're using a crash, consistent snapshots, that's a copy of the map, as Brent described during his part of the

32:12

session. And to do that, we don't require any additional software in the universe of things. You just tell the array to take a step at a point in time, and it makes a copy of the map pointing to the blocks of where they physically inside of the device. One of the cool things about Flash array snapshot capabilities is it uses a redirect on

32:30

write technology rather than a copy on write technology. And if there's any folks that have used the anywhere in the house, I'm sure many of us have you seen that lineage between snapshots where I have snapshot A and B and C that I have to delete this one and has to merge down, and it takes a long period of time and you get a phone call that your snapshots have been

32:47

around too long. Flash very breaks that concept down and doesn't use that type of capability isn't redirect on write. So as new data is ingested, it's going to be a pendant to the actual volume inside of the array, and the point is going to be updated. But this gives you the ability to have is snapshots that aren't dependent upon each other. And there's no performance dependency upon them,

33:08

either. Inside of the array, just pointers to the data. And so now, if I have, you know, snapshots A, B and C, I can delete Snapshot B, and there's no dependency on that. If that snaps not being used key to the functionality of a crash consistent snapshots there's no I o questions, so that I owe freezing a friend described doesn't happen when

33:27

I do a crash. Consistent snapshots. There's also no need for right reordering inside of the array because we respect the order of the rights as they come into the array, which means a database that is on a volume that's being snapshot will always crash, recover and become a usable database, right? And so it's gonna online.

33:46

It's going to recover the database back to a consistent point in time and bring that database back online. This capability works at the volume level, right? And so the idea is to start thinking about how we place our data into the array and onto the volumes and how we refresh that data from the volumes inside of that system.

34:05

And there's zero impact on the performance of the source system. So the system that I'm snap shotting that production database or data warehouse, But I want to take a clone from we don't have any Perth impact against that device. Right? There is an Achilles heel, though, of crash consistent snapshots in that the unit of recovery or that our p O is the snapshot.

34:27

I can't recover between the snapshots. Forget point in time recovery to restore to an exact point in time, right? And so that's where the SSR volume shadow copy snapshots come in. So, inside a flash, right, we integrate with the V s s framework inside of windows, right? Which means we're going to have some software that we have to instal inside of our windows

34:47

boxes that run our sequel servers to help coordinate that snapshot activity that Brent described, right? What that means is that I owe will be Questar frozen for a period of time so that we can take an application. Consistent snapshots. This text still uses redirect on write. So all the stuff that we just talked about from

35:02

a performance standpoint still applies. And this enables point in time recovery, which means I can bring back a 40 terabyte database, play back a couple of different logs and get exactly to I want to from a recovery standpoint. And so that idea is I'm going to use these for appropriate use cases, right? If I need a point in time recovery, well,

35:20

chances are I want to use volume shadow copy based snapshots. If I'm refreshing an environment to get access to data Well, most likely I'm going to want to use crash snapshots to get access to that data from a point in time. But on the right, though with the S s application consistent snapshots, you have the ability that may be saved you a 40 terabytes,

35:38

restore nearly instantaneously and get back to that point time. And so that's a very fantastic capability to recover very quickly, right? And I'm a d b a by trade. Snapshots aren't backups, but it gets me really fast recovery. We'll talk about some replication technologies that we get that data replicated off onto other

35:55

devices in a second year. Additionally, R V S s extensions also integrate with third party backup solutions. So the converts and the victims of the world we have the ability to work with snapshots at the array level with those platforms. And so let's talk about the process of taking a crash.

36:12

Consistent snapshots. We're going to talk about some use cases, and we're going to do it in code together inside of my lap. So to take a crash consistent snapshot of the database or collection of data bases on flash, right. It's pretty straightforward process. We're gonna offline the target database, the one that we want to refresh,

36:30

right. So, traditionally, you'd have to do this if you were doing a back up and restore you to think that database offline to restore it. So we do this in flash right to Well, I'll find the data based on the target. I'll find a volume associated with that database. We'll take a snapshot and clone of the volume or the collection of volumes,

36:46

which we wrap up in a protection group. They have a consistent backup across multiple vines. We'll take that snapshot and clone well online the target, and then we'll online the database. So this happens in about six lines of power shell and a few seconds right? So the idea here is we're breaking that

37:01

dependency on having to move all of that data between systems. We can just do that inside of the era and get access to that data instantaneously. So let's talk about some common use cases that we're seeing customers. Our customers are using snapshots for out in the field. So if you're doing anything like Brent talked about protection against Grant somewhere

37:20

instantaneous data protection, I can get really fast resource out of my array nearly instantaneously. Another core capabilities, Dev Test refreshes, being able to clone production down to Dev. Multiple copies of death potentially across multiple versions of our application. I can do that in seconds, right? And inside of flash array.

37:37

That's going to be a data reduced event, so we're not going to have to carry the weight of doing that data movement to get access to that data. It's just a copy of the map. I get access to that almost instantaneously across multiple systems or even multiple developers if I need to. This is another big one. How many of us have had to fill out a change?

37:54

Management has to put the roll back. Time is multiple hours because we know we have to restore that database to bring it back in the full weight of that database. What if you took a snapshot before and then you're in your low back? Time is instantaneous, right? So it's going to change management request to prove a lot faster when you have that capability inside of your stores here.

38:14

This is kind of a cool idea, the concept of being able to snapshot, prod and attaching a copy of Prod to your data warehouse to perform your GTL. So now, on pride I don't have to deal with, you know, blowing out the buffer pool, doing a bunch of table scans to compute and networking associated with my GTL process. I can just bring that data from prod over to the data warehouse and get access to that data

38:33

and pull that data out, potentially with just straight T sequel really simplifying the code stack associated with that process. Another idea, though we could do with data warehousing is I could produce a gold copy of the data warehouse, put it out there, work on the build copy and then instantaneously swapped those with snapshots and clones, getting access to that data and not having folks wait for the

38:53

bill to happen overnight. I can just swap that out instantaneously at a fixed period of time and, as Brent described, also offloading database meets. Lots of folks have 24 7 shops. Who do you choose to make sad when you want to run technique right? That's a real business decision. A lot of customers just don't run checking so

39:11

you can snapshot prod. Attach it to a secondary instance. Run Check db. If it fails, then go check TV problem. The cool part about having that clone is you can then test your fix right immediately, and then you can iterate on that potentially. If you mess up, I can test the fix without having to do a full weight restore to replicate that again once

39:31

that fixes tested out against that other incidents. I can apply that prod and move on and the other big one that Brent covered the idea of being able to extract a table, right? I could snapshot clone, bring a database back to prod just pulled the rose out that I need to get access to that data I remember once in the beginning of my career. Yes, I did this.

39:49

I torched the whole table. It took me, Let's see about 10 minutes to get almost, you know, let's say 98% of the table back. But then I had to go restore the database to get the remaining rose. I was able to pull some of the roads out of a reporting database to get them back in, but then the ones that changed that day, I had to do a point in time.

40:07

Restore and pull those rose out, and that was a multiple hour event to pull that back. But the idea here is reducing the complexity of your environment and reducing the computer networking and storage overhead to facilitate for these operations, which usually are byte for byte level copies of what we're doing against our system. Snapshots consume very little space, as Brent describe.

40:28

It's a map, and so as we clone that, it's still the same physical stuff inside of the array of pointers to that and as data changes, that will be a pendant. And we'll track those changes, and that's going to be reported to you so you can track the actual snapshot consumption over time. There's no performance impact, and traditional database restores are expensive.

40:47

It consumes your time. It consumes space in your data centre, having multiple copies of data laying around. And it's challenging in a restore event that my recovery is multiple hours to bring back a large database. That's really a challenging thing for a business with regards to flash rate. Specifically, you can kind of choose your

41:06

tooling on how to implement this. And so I encourage folks to build automation around these refreshed technologies. So, using either power shell or python, you can build repeatable processes to refresh databases, and volumes associated with databases encode over and over again. And that's important to building good sustainable systems.

41:25

One of the things that came up in the chat is what can I do with these things from a data protection standpoint, because if it is just inside of one array, that's that's just really fast restore. That's not backup. I can take that the actual snapshot data and replicate that, too. Other locations that can send it to another

41:40

flash array. I can send it to the cloud in any of the major object stores in Azure or an S three. I can also take it to what's called a cloud block store, which is our flagship product implemented in the cloud. And so that way I can get that data off onto other platforms so that I actually do have the

41:57

ability to recover in the event of a main failure on the core system. So let's go ahead and see this in action. I want to show you all video. Normally, I would do this then alive, but I am running on the backup laptop. It says I am a good paranoid D B A. I have a backup plan, and that was to record this video. And so what we have here is inside of our test

42:21

lab. Inside of pure, I have two sequel servers on the left. Here we see a TN sequel. Oh, one and a n sequel 02. And so this database here FT demo is 3.5 terabytes. And so, in a couple of lines of code, I'm gonna snapshot and clone that database and attach it to A T and sequel to at the bottom

42:37

here because I want to get access to this new big table that I just added named Big Table. So let's walk through that process. So here we see it's a little bit tiny on the screen, but that's about 3.5 terabytes. And let's jump into the actual code to make this happen. So remember the process diagram that went through in the presentation?

42:56

What I'm gonna do is that process. Now I'm gonna get some credential information inside of power show going to ask you for my passwords that I can log into the Iraq we're gonna offline. The database on the target system is the one I want to refresh, right? And so the code here is just alter database FD demo set off line it was.

43:15

Turn that thing off. The next thing I'm going to do is I am going to offline the volume. And so that's going to go ahead and turn that volume off because I want to refresh that volume at the array level. Next part of the process is we're going to log into the flash array and then we are going to overwrite the volume on the target instance, the one that I want to refresh and so don't

43:40

focus so much on the code here. But the process of going through this is pretty cool to see it in such a simple way. So we have this commandment here that will go ahead and read from our source system, and it's gonna overwrite the target system. That's what's happening here. That E n sequel to is going to be overwritten with a copy from Prague.

43:58

Once that executes I online the database. Excuse me. Online. The volume I online the database done right. We have that data is now refreshed, and it takes just a couple of seconds. So now if I go to 80 and sequel with you and I refresh the data are for the tables, that 3.5 terabyte database is now refreshed and available to me on that other instance that

44:20

happens nearly instantaneously inside of the array, saving that 3.5 terabyte backup and restore to make that happen. So it's pretty cool capability. We're working with customers and show this to them to get like, Wow, that's going to enable a lot of scenarios that test refreshes, offloaded, database maintenance, intervention, CTL all those different capabilities.

44:41

So I am going to stop sharing now. I guess we'll spend the rest of our time chatting and going through some Q and A, and we certainly have a whole bunch of questions. So let's see here now and they're good. They're good. I've been typing madly like what you're doing, and I'm like, Okay, so now I don't have them all. Let's hit the next one.

44:59

Anonymous says, could you selectively copy production data to a data warehouse of, like, one or two databases? So what you want to do, really there is? You want to attach the snapshot to whatever target system you're gonna doing the copying on and sometimes it's even like an s s I s server, and then pluck out whatever data you need from there.

45:17

But you don't want to think of it as like restoring parts of a snapshot. Restore the whole snapshot. It's not like it cost you extra. Take the whole thing and then let whoever is doing the picking of data movement out there pick and choose what they want. Yeah, Uh, let's see here. What's next?

45:32

You want to pick one Says, Can we get the presentation? Yeah, You get to ice on the PDFs over to pure and so they get that, Um oh, Marson asks. Would you be able to snap shot several databases from different sequel servers? 21 Development environment. Anthony, I'll let you take that. That's a great question.

45:53

Yes, So if you look at it as I said, don't pay attention to go. But if you look at the code and you put it and saw that what I had as a source and a target, right, so what's happening at the volume level? So the source is going to be wherever it's coming from, and the target is going to wherever it's going, too, and those can be the same instance. Those could be different instances,

46:09

and it could be one too many. Just as you're thinking, I could bring that data to a central location if needed. And the cool part about that that underlying stuff where the bits and the bytes actually live is data reduced. So that's not going to be the full weight of those things being copied to the target. It's going to be data reduced and share those

46:25

blocks physically inside of the array, so it's not going to carry that consumed that storage. And if you want to pick out another question while I answer one, um, so I'll say, Anonymous says we're a heavy always on availability groups shop. What type of snapshot is most appropriate? So if you're running always on availability groups, and that most likely means that your databases need to be in full recovery model

46:49

anyway. So if they need to be in full recovery model, you're probably taking transaction log backups. You could. In theory, backup belongs to no, but that's generally a pretty bad idea. So you think about what you're going to do in terms of in terms of fail over. If you're going to fail over in order to solve a production data centre down type issue,

47:11

then you have a good copy. of the database is over in each always on availability Group replica. Anyway, what I'm more worried about is how do I rewind to a specific point in time in order to give somebody a copy of the data as of last night at 10 PM for development, for a refresh scenario for some kind of e T l kind of thing. And I don't know that I really want to stop sequel server from writing in production in

47:38

order to get that copy I am all about. Just do crash consistent snapshots in a scenario like that, because all you need is a copy of the data and call it a day. Let's see here, Anthony's typing stop typing, answering that. Wow! Well, this one specifically. So someone asked. If I'm going to, we're going to provide the

47:57

scripts. Yeah, that And there you go. I just answered that, and I'll pop it in the chat. The scripts at that git repo cover various scenarios, um, for flash array and snapshots v MFS Space Snaps, which is a unique animal. Leave all these snaps and clustered based snap, so there's kind of some different use cases and

48:15

implementations to achieve that end. I want to grab this one next. So how do you find the sweet spot between using sand snapshots and a high transactional database and having to deal with the IOS? Done? And the long story short is, you have to take a full backup eventually so you wouldn't take an eye open. You wouldn't take a full snapshot in the middle

48:36

of the day. You do that at a window, and it's appropriate for your business, right? And that's what I talked to customers. If you're gonna take if you are going to use VSP snaps, you're gonna take that. Basically, when you would take a full backup right outside of your core window of doing whatever it is,

48:50

your business does what it does. Let's see, uh, Kevin says it would be possible or prudent to offload full backups from a separate sand snapshot. One of the things that I would say in there is sometimes you have business requirements where they want a copy of the full backup that they're going to go hand it to an external partner.

49:14

You know that it's used for something other than sequel, server backup and recovery. Find out what the business might need around that. That might be one of those situations where you still need a full back up like a native full backup. I'm gonna I'm gonna drift into another territory for a minute.

49:30

So, Oracle, can we do this with Yes. Yeah. And so, one of the cool things about Oracle's you can put the database and hot backup and take a consistent snapshot of it without any external tooling like the SS, because it doesn't just exist in the universe. Uh, so that's that's a core capability there. So let's see here,

49:50

Uh, but is Oh, Eric says, if we do it the snapshots Do we have to deal with orphaned users and post scripts? Absolutely. You still have to deal with our orphaned users because if you created logins on a sequel server differently with different SIDS, then that's still still problem there. It's no different than working with conventional database backups and restores.

50:10

Yeah, Gary asks our sequel, DBS, ever reluctant to change. I think it's kind of the job breeds paranoia, you know, and so people get paranoid And they said I wrote this script in 1976 and It's worked great for me ever since. And I'm like, Yeah, OK, come on, Speedy. You need to learn a little bit of new stuff

50:30

there. Let me grab. Do snapshots negate the need to perform backward compression encryption? That's a totally different animal, right? If I'm gonna take the bits and the bytes and copy them somewhere else, I'm achieving a different result data reduction inside of our devices. How we have some so that compression and

50:49

encryption generally are going to cause data reduction and throw it out the window. But there are some scenarios where you can get decent data reduction out of encrypted environments if the database is key to the same. So, for example, if I snapshot and clone the database that's TD encrypted, that will data reduce out because the key that encrypts the thing is deterministic, so the physical structures will

51:08

be the same, right? Compressing throws that out the window backward compression throws that out the window. So one asked over and chat. Most people when you're asking questions, folks put them over in the Q and a part, not the chat. I just happened to glance at the chat, but one asked, Can you provide an example of how

51:25

snapshots can be used for D. R. Testing? I'll give you a great example. So one of my clients, we have a 20 terabyte database. We can't just magically fail it over somewhere else without repointing a whole bunch of other applications. So what we do from a sequel server perspective to know that the sequel server is ready is

51:41

we'll stand up of VM over in D R. Attach the snapshots to verify that the back that the data is there as of a current point in time and from a sequel server perspective, our job doing disaster recovery is over. We can tell our application, folks. Hey, if you want to point your d r app servers over to this, here's the I P address where you can go try to query,

52:03

and then they can test their d r side of it. But that's magical, because that's something I couldn't do without taking production. Now, uh, Anthony, you see another one you want to answer in there? Yeah. So can you take a snapshot of a snapshot? Absolutely. So if you take a database or volume snapshot,

52:21

clone it over here inside a flash. Very specifically. That's a completely unique volume. Now that's accessible to to the other instance that it might be attached to. I could take that snapshot that move that again forward. That's actually the most common scenario that will C for data warehouses, a pull prod, a bill here in the intermediate instant snapshot and

52:39

clone that the final reporting system. Right? Keith asks Brent, you're still drinking coffee? When do you switch over to the adult beverages asking for a friend? Note that this is a cup. It is a yeti, uh, insulated cup. That means it can keep hot things hot and cold things cold. And then on to the next question,

53:00

Um uh, let's see here. Oh, anonymous says, Well, Microsoft support honour the use of snapshots or will they toss us to the proverbial curb? Oh, man, I love this. So the way that I would think about it is so imagine that you have a five terabyte database and you try to do backups the native way. You're screwed. It's going to take absolutely forever.

53:23

With regular snapshots, you can test them so much more. You can go attach a mother. Just two other sequel servers. You can test the kind of things that Microsoft support will ask you to test. So, for example, if you have database corruption due to a sequel server bug and they're like, Hey, go run this command on the on the database.

53:43

I mean, if you've got a five terabyte native backup, that's gonna suck, restoring five terabytes than doing the test, and then it fails and you got to go do it again. Saying snapshots you can do it immediately makes it so much easier for Microsoft support to help test things that ordinarily would test take aeons in order to test. So no, they love that kind of thing.

54:04

You're talking about testing one of the hardest things I had to do in my career as I had to change the column type for for a money column in a system. Let's talk about right. That's up. The business of that and the majority of my time was code change, restore code change, restore run, scripts, run script, you know,

54:23

and that iterating on that process took lots of time with what we're talking about today. That process could become nearly instantaneous of getting back to that initial starting point of testing your migrations, upgrades, whatever it is, So yeah, and it's a classic case of where it doesn't matter how much CPU power you have. You know, you could test it on a dead box, which is like four CPU cores and 16 gigs of Ram.

54:44

You're putting the storage through a workout. You know, when you do changes like that and it's just elegant to test when you've got fast storage and snapshots, So see a couple of questions around VM DK s. Uh, so the idea behind the MDK. So I think looking at how this would be allocated physically, we're kind of talking conception about snap shopping,

55:04

discrete volumes to get access to the stuff on that volume. If I'm on V. M F s, it sounds like you are familiar with the fact that that VM fs becomes the volume itself and then I have virtual discs allocated within that volume. Inside of flash array, you would snapshot the volume that supports the VM fs re attach that to your cluster re signature that to have a

55:24

unique identifier, and then you pluck out the virtual disc that you want. That code is in the links that I sent to you all and get every put a kind of walk you through that exact process.

Hybrid Cloud
Tech Talks
FlashBlade
Microsoft Solutions
FlashArray//X

Brent Ozar

Founder/Owner, Brent Ozar Unlimited

Anthony Nocentino

Senior Principal Field Solutions Architect, Pure Storage

You're a DBA responsible for making sure SQL Server databases are online, backed up, corruption-free, and fast. Your databases have gradually grown in size over time, and you're starting to hit new size issues you haven't encountered before.

Nightly maintenance windows are getting smaller, you're not able to refresh your development environments quickly enough, and you're not able to run DBCC CHECKDB as often as you'd like.

You're starting to wonder - how do people with multiple terabytes of databases handle it?

In this session, Brent Ozar will explain how he came to love storage snapshots. He'll discuss how they work, why they're safe to rely on, and how to tell when they're a good fit for your workloads.

Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.

Continue Watching

We hope you found this preview valuable. To continue watching this video please provide your information below.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.