Video
Webinars
Solving Top 6 Database Challenges

27:29 Webinar

Solving Top 6 Database Challenges

Database administrators and data architects encounter several difficulties as they administer systems with different requirements and behavioral patterns.

This webinar first aired on June 14, 2023

00:01

We're having problems with the storage. It's fine. Alright, So I'm gonna talk to you about the top six database challenges. How I've actually arrived at this is interesting because of a challenge I was given in December. So I got a new boss, Sam, and the boss. The boss is like, tell me why your job is important.

00:20

I was like, I don't know. I don't even know what it is. So as a part of that, he was like, OK, what is your position on databases? I'm like, What's the position? He's like, Well, tell me what strategically PO storage should do for databases. I kind of waffled off a little bit and he eventually stopped me and he said,

00:34

Andrew, you've just spoken about Amazon having a better database service about how Azure has got a managed service. I have no idea what's going on in a month's time. Let's have the same meeting except this time. Tell me what I want to hear. And so in December I had lots of fun. So I I had a six month year old with a child

00:52

having lots of crying, had Christmas, lots of things happening, but in the back of my mind. I was like, I don't fully understand every aspect of the database market just being very honest at the time. And so what I did was like any normal person. I went to Google and I said, What are the problems with databases? That was that was a wild journey.

01:11

And so what actually started to happen was I went to the usual suspects like statisticians. I started to look at what people were posting on Reddit subreddit, which was an interesting dive. Then I started to look at forums, stack overflow, absolutely great. Ever want to find, like, what's really going on in the market?

01:28

The analysts have no idea. Everybody on Stack overflow is really keyed in. But what really started to happen was I resulted in a number of insights into the industry. And over the last six months, I've actually been pairing those insights with how is storage gonna solve it? And I actually came to a key insight that is probably very obvious to everyone.

01:49

But at the time, it wasn't obvious enough to me, it was Storage is very important to every database on the face of the planet because we invented databases to abstract from the storage and provide better structure. Um, you have non, um, persisting databases, which is great. But at the end of the day, you want to do something with data,

02:05

and then you want to make sure it stays in the same place and in the right form. And so let me bring you to the important part of this. Who am I? My name is Andrew Sullivant. I'm a principal solutions manager with with pure storage working on white papers and problem statements for database administrators and how our storage can make your lives more

02:26

efficient, easier and all those lovely kind of fluffy marketing statements. But at the end of the day, I want to take your time, and I want you to do things that actually matter to you instead of a lot of repetitive things that we can lift the heavy bits of. So OK, let's talk about some of the challenges I started to identify through all of this really fun research.

02:48

The first problem I found was databases are growing. For some reason, we don't like deleting data. I don't know what it is talking to legal might have something to do with it. But everybody wants to retain data for as long as possible because apparently it gives you a bit snapshot of what's going on. And so what's What's happening is data is growing, but also the usage of the systems on

03:06

which the data resides. The usage of them is growing, so demand is exceeding supply, so to speak. And so we've got a core problem. There is not infinite money in the world. Everybody is who's like, Oh, I've got a database. Um, I need more computes.

03:21

The first question that comes in is, How much is it gonna cost? I need more storage. How much is it gonna cost? And so the ability to manage scale for both performance and capacity is one of the chief things that I keep coming up against Whenever I. I did a lot of this research, and it's also been validated by talking to some customers that pure storage already has of what

03:41

do you spend the majority of your time doing? And it's like, well, it's performance management. It's either performance management in terms of query tuning or somewhere else in the stack. Capacity management is actually a problem we solved as an industry a long time ago by simply just shoving more hard drives and discs into things. Um, D has also really solved that problem.

03:58

But managing scale with cost constraints really, really big problem. Now let let's start to take a look at the second problem. I started to identify so DB a spend a lot of their time doing tactical things where automation is not present in these environments. And so you've got a lot of break fix going on, which cannot always be automated or solved.

04:24

And so the ability to maintain consistent performance while doing a lot of tactical operations while you've got ever changing demands, which demand is predictable up to a point. But you can never know when something's gonna break until it's broken. And so the second thing I identified was, How do we maintain consistent performance in an

04:45

ever changing world that is constantly growing? We can't just throw infinite money at it, so we've got to somehow figure out a good balance Now. The third problem I had I found was actually is the one I spend the most of the time during my customer interviews like they keep going over and it keeps coming up with the majority of the conversation. And this is how are they essentially

05:07

accomplishing data protection, disaster recovery planning and H A, um, because there's there's so many different ways to do it. And And I actually had a conversation with somebody yesterday where they said to me, We want to use VM what is pure do specifically with VM that is going to integrate our VM backups and our database backups.

05:26

But then they caveat it with We don't want to use the VR man plug in um, for any of this and I was kind of like That means you can't use VM for it, and so you can see the problem statement there of you want a central place to do all of these things and accomplish them with ease. But the complexity of the business requirements is making that a little bit tough at times.

05:49

Another database challenge, which started to come up as I dove deeper into this this this area was we keep moving data around, and as a part of that moving data around, it's essentially taking data from one format putting it into another format or taking it from one location and putting it in another location for different use cases such as workload management, such as doing OLTP in one place and data warehousing in another.

06:14

But the data pipeline is becoming increasingly more complex, as the data sources for what we are doing in these databases is growing. Um, really good example. You've got unstructured CS V files. We've been doing this for like 40 years or so. Maybe not the CS V part, but we've been lifting unstructured data and putting it into a structured engine for a while.

06:33

We we thought we had that down until the analytical databases came along and we were like, Oh, let's just stop pumping things into and connecting them all over the place. We have created personas to manage these pipelines. This is This tells me that the overhead that is a part of this ever emerging, ever growing market is growing linearly with the demand to do different things very

06:54

efficiently. Which, to paraphrase the gentleman's statement, that's an oxymoron. It it it is exactly an oxymoron. We want to do more, but with very limited resources and how are we doing that? We're ignoring the limited resources, and we're just running headlong into a lot of it. That's not to say everybody is doing that, but it seems to be a common problem now,

07:15

I. I actually really enjoyed this because I wrote my thesis on it. Um, data localization is probably going to be one of the biggest problems that's going to come out of. Let's let's call it the next decade or two, because we've solved the problem of we created the ability to connect the entire world.

07:34

And what we're doing now is we're creating whole new worlds within that connected world. Um, Metaverse kind of thinking is an example of that, uh, Facebook in 2007 created an entire isolated world. Um, and and the interesting part of that is politicians looked at this and they were like, Oh, we do not like sensitive data. And we don't like our citizens data going

07:56

everywhere. Um, and then after 2014 and the Snowden leaks and everything, like, everyone went very, very heavy onto, like, privacy thinking. And so what's emerged over the last few years is this. Well, data needs to reside in specific places and there are frameworks around it. Um, a lovely friend from Southwest gas right

08:16

over here was talking about yesterday. Well, they've got federal compliance. They've got to they've got to meet with, um the form their data takes all of the different things around it. But on top of that, I can take a database and I can copy it across to another data centre anywhere in the world. Now, because of the emergence of cloud computing, this has created a really

08:37

interesting problem where just because we can do it doesn't necessarily mean we should. And so the safeguards have been put in place. Um, your biggest problem is if you are a small shop and you're a DB a and you're one of your managers who doesn't really care how things happen just says, Move this over there, OK, you go and do it.

08:56

There's nothing safeguarding the legality of what's going on with that data movement moving it. Moving data within the United States is fine. But then you've got GP D or in the EU as well to now contend with, and I'm pretty sure over the next let's call it the next decade, we're going to see emergency, even stricter requirements for what data can move,

09:15

where and when. And so finally, migration. This actually feels like it's a problem that's been around for a while. It's just become more fun to deal with as we start to talk about on premises to cloud migrations, unmanaged services to managed services. We we've actually created a sub industry around moving data around in a continuous circle,

09:39

Um, which is an entirely different talk in itself. But it's fascinating in a job creation cycle scenario. Now, as a part of this talk, I want to tell you how pure can solve some of these problems. And so what I've done is I've taken every single one of these problems, and I've given them their own slide, and I'm going to talk you through how our products are

10:01

built in a way. And the data services within those products, by the way, are built in a way to try and assist with either solving the problem directly or cooperatively with individuals or other software. So let's talk about managing scale within cost constraints. I actually had this all slided out where I was going to show you each and every single one of

10:22

the flash A rays and the flash play. And I was gonna be like, Oh, look at all our products But thinking about it like this morning I was like, We've been putting these things on stage for you for two days. You know, these products, it's absolutely fine. But just as a general recap, we have every single one of the arrays

10:39

that deals with a different price point. Now, flash Array. E is not on this, um, because it didn't have the picture. But, uh, as assume in your mind that there is a picture, a picture up there stating the use case it deals with. But essentially, we've got a product for every performance tier.

10:54

We've got products that fit different capacity tiers as well. Um, I'm pretty sure that you've got huge database landscapes. We've got one DB a in the crowd. We've got another fellow who's kind of a DB a. But he's in a more of a managerial position at the moment. I'm pretty sure there's a lot of databases in that landscape which are not running hot.

11:13

That's to say they do not need Tier one performance, but they might be archive style databases. There might be things that are just kept somewhere else. Flash Ray C or Flash Ray E. Absolutely fantastic for that use case. Um, to be more cost effective with some of these things is solving that first problem. We need to be very conscious of how much this

11:32

stuff costs and also the impact on long term costs ESG things. You can't just keep shoving a rays into your data centre when you've got a limited power requirements. We actually have a lab where every time I keep asking them, I'm like, Oh, can you do this? They're like, you're at 96% of your power allotment. Are you sure?

11:49

Um, and I've actually come into that that kind of behaviour in two separate roles. So it's definitely a thing of you have limited resources to do infinite things. What? What do you prioritise here? And so, with the ability to let's say consolidate with the XL absolutely fantastic you take, Let's say you have five Oracle databases, for example,

12:10

and they run at X performance. Um, and you needed to buy two arrays for that. Well, the XL could then consolidate everything into one, depending on your performance requirements. There's no indemnification there. Um, but the really, really important part of this is how de duplication is built into these products to provide better efficiency. I mean,

12:29

you've heard this over the last few days. We built our entire product line on its flash rate. It has enabled the ability to solve a lot of these problems in a much more efficient way. And just like Sean said, um, earlier, it'll only get better. I've actually seen D du in Flat Ray get better over the years.

12:46

One of the most annoying parts of this is I cannot justify asking for more discs from, um, my infrastructure team simply because we're actually not running the arrays as heavy as they should. I've taken 100 terabyte database, put it on 100 terabyte array, and at 3 to 1 ded Dulic. I'm only using 30 terabytes.

13:02

It takes a little bit of the geekiness and the fun out of just filling up a raise. It means you have to work a bit harder, but we'll get there. It's it's all right. And finally, every single one of these products comes with a quality of service component. So if you're talking about a lot of consolidation and you're trying to also look at, well, how can I be very cost effective Quality

13:22

of service? Being on from Day one is really important because you don't want need databases knocking each other out. So that's a fairness policy across the entire array. But more importantly, within the array, you can limit things down, so there's a lot of levers that can be used. Um, So,

13:39

for example, if you've got nine databases that need to run hot in one database that runs sporadically hot, you can, but you don't need it to be critical. You can limit that one down to prioritise the other nine. This is an incredibly good lever to solve cost constraint problems and scale constraint problems without essentially having to buy a whole different array.

14:00

Now, how do we maintain consistent performance? Well, we have an entire platform for that. The ability to provide consistent performance is entirely dependent on your ability to predict what's going to happen within Pier one. You've got that I believe it's the single most fantastic thing about Pier one But that's my opinion.

14:19

Is the ability to do forecasting. So we take all of your telemetry. We collect all of it up. And what are we doing? We are taking. We're shoving it up there, and then we are providing you with a lot of understanding of what is going to potentially happen.

14:35

And there's there's a little window somewhere there. So for existing customers, go and find this window. It's the single most fun thing you'll ever do, and what it is is in 40 days you're going to run out of performance in 50 days. You'll run out of capacity. That is incredibly fantastic forecasting, because it allows you to prepare to solve the

14:52

first problem while also looking at how you're going to not end up having a problem with consistency across your landscape. And if you're in a very large landscape with thousands of databases and you've got, let's say, five arrays sitting there, you definitely want that surety that things are going to stay up where they need to stay up.

15:12

All right. How do we ensure database safety in an honestly, a number of ways? Um, volume and file system snapshots, an interesting question came up yesterday in an a MA session of our snapshots backups, and my comment to that as a part of the A MA session was It's just taking a snapshot is not a backup. A snapshot is something that can be

15:34

cooperatively used. Um, there's certain things around it, such as moving them off of the array that can be used as backups. But we're still using backup tools like R Man and sequel N backups and ISVS because they suit a purpose that is still required. Um, but snapshots can be used as recovery points when they need to be used.

15:55

So within the array not buying any, um, expensive ISV not worrying about what your DB A is doing. At that point. In time, snapshots can be used as a recovery point, and what's even more important is how safe mode snapshots integrate into that cooperative cycle. So you take a snapshot of your database. Let's assume you're also doing backups on the

16:17

side at the end of the day. The snapshot, combined with safe mode, then, is only a peace of mind thing, because then you have multiple ways to recover in the events of absolute disaster, and having multiple ways to recover in the events of absolute disaster is infinitely better than having no way to recover at all.

16:36

Now I love protection groups. Protection groups are fantastic. They're the best thing since sliced bread. Um, just 30,000 ft view the ability to just put everything into one place and say, Go and take a snapshot of everything contained here in is infinitely better than oh, I need to make sure all of this is correct. So the array itself, in its native form, is already taking care of.

16:58

How do I ensure a database spread across multiple volumes or multiple databases? Spread spread across multiple volumes are all consistent with one another. Protection group snapshots Best thing since slice bread. Milk might be good, but you're right. All right. Now a another.

17:15

Another resiliency possibility within the array is active DR. Um, so that's continuous replication for those who have, um, for the for the infrastructure people in the audience who have, um, post databases and my databases. They've got semi synchronous replication. This is scratching the same itch in a much more efficient way because we're doing it at the

17:37

storage layer, we have the ability to near synchronously just replicate what's going on, um, so continuous replication. As the change goes in, it tries its absolute best to get a copy to the destination array. If the destination array goes out of sync for too long, well, we'll just take a snapshot rebase line it and then keep it rolling forward.

17:56

This is really good for scenarios where you don't necessarily need full synchronous replication, but you want to ensure that you've got recovery points that are continuously replicating for different databases. Um, as I said, those who are familiar with my SQL and post with semi synchronous replication scratching the same edge just much more efficient simply because we are only moving the data from one

18:19

array to the second array after it's been thin, provisioned and deuced. So you're moving much less between the arrays, so you're taking an operational function from the database engine layer and you're dropping it down to the storage, and you're gaining a lot of efficiency for a very similar scenario. It does not replace um, primary secondary clustering scenarios,

18:39

but it does provide you with a level of uptime in the events of one site going down, and it can also be done over much longer distances. Finally, our favourite toy active cluster active cluster is OK. I'm saying a lot of things are good since sliced bread. Let's just say the array itself is the best thing since sliced bread,

18:59

um, active cluster synchronous replication. But it's also read right on both sides, so it provides you with some great simplicity for a storage H a scenario. But providing a well not providing combining active cluster with application level availability just gives you that absolute certainty that if you're audited by your legal departments or who actually does these audits for, are you highly available?

19:25

And are we going to crash the entire company if this goes down? Active cluster is absolutely fantastic because it provides that extra up time for the storage layer while providing the simplicity for the application layer to come to stay up as well, combining active cluster with something like Oracle Data Guard or Oracle Rack. Let's say Oracle Rack gives you the peace of mind that you're going to have availability at

19:51

the application layer and the storage layer, But I also believe Ron correct me if I'm wrong, it's maximum availability. Architecture MA a P is that correct? OK, it's There's no P, just MA a maximum availability. Architecture is a very well known thing in the Oracle world. Um, and one of the nice things about having 30 years or 40 years of database engineering is

20:12

these databases all copy from one another. So you could probably take the architecture of M MA a and kind of drop it into a different database scenario. Alright, let's talk about the functional ownership black box because we have infrastructure, people in the in the crowd, How many times I'm gonna ask the infrastructure people this question How many times has the DB a come to you folks and said,

20:32

My query is running slow and you've just been like So what is? There we go. We got weekly Anyone else we got, we got a quantitative reply I was hoping for like everyone was like, Yes, they're really annoying. But you know what? We We've got enough people there tell us the the queries are running slow.

20:49

They're just pointing fingers. The database people have no idea what's happening at the storage layer. Their their view of it is the query is running slow. They're not like my eye ops are slow. And so what? What tends to happen is this really,

21:01

really, really interesting scenario of because you've got black box personas of infrastructure, people and S admins and you've got application people. Sometimes they just don't have insights into each other's world. So the times of resolution is a lot less than it should be. How do we solve this open metrics?

21:20

So my colleague Anthony Cento, um, put together something really interesting. He used the open metrics exporters for Flash Ara. They also exist for flash blades, and he put together this dashboard. This dashboard is telling us what a SQL server is doing in the same place as what the storage is doing. And what we're able to do with a lot of this is

21:43

we can correlate up behaviour. So if we are able to see the storage is running at 100% and the database is waiting at X amount of time, you can correlate the problem directly to the storage and you can screenshot it, send it to your infrastructure folks. Well, well, you can send it to your database folks and say, Yes, we're wrong or you can send it back to

22:03

them and say No, it's a badly written query. This will help everybody feel a lot more calm about the situation and kind of look at it and say, This is a cooperative behaviour instead of a well, how do we point fingers and just get it to go faster? Behaviour and I? I really, really liked it because it's it's

22:19

it's it's actually scratching an interesting itch, which is people don't necessarily want to spend a lot of time solving problems. They just want to get back to valuable pieces of work. And I firmly believe that the future of databases is every time we're able to solve that problem, we make the data itself more valuable, and data engineering becomes what our jobs will be going

22:44

forward for database administrators, et cetera. Now Port Works also has a metrics scenario. I've not dive very, very deeply into it, but it does provide metrics at the volume layer. But the really nice thing about Containerization is because everything is so close together, you can pull metrics and provide essentially a very similar outcome.

23:07

as here where co collation starts to happen. How do we solve the legal problems? The truth is, we actually don't because we're not lawyers. What we do do is we provide a number of ways to see where everything is. So what I actually got my pure one colleagues to do is pull a map of,

23:30

well, tell me where everything is. What's your geolocation policies? And what I was actually thinking was, if we've got the ability to tell where an Array is and we've got a legal framework around it, pure one has an API. Where all of this data is pull, it's very easy to start creating reporting scenarios around where is data?

23:52

And if you can correlate where data is to where the volumes are and where the storage is, you can use that as a part of your reporting and auditing. And I'm pretty sure at some point storage and all the stuff and all the database stuff is all just going to be collapsed into one big policy. Um, and I'm pretty sure databases or service offerings already have this.

24:12

That's conjecture. Now, how do we help you migrate to modern platforms? We have those three data protection scenarios I talked to you about earlier, which also serve as data mobility scenarios. So active cluster snapshots and active DR. Nothing says I can't move stuff around using a snapshot.

24:36

Snapshots are absolutely fantastic because they're very efficient. Remember, they're DJs. They're just pointers to existing data, and it's all it's instantaneous to create one. It's instantaneous to recover from one I've never actually run the test of, like, create 100 terabyte database. Take snapshot,

24:52

erase it, create another 100 terabyte database and see how long it takes to respond. Um, recover. But I'm pretty sure, even in that scenario, it's How fast can you click a button or run an API call? Um, but more importantly, they're also mobile, so moving stuff from one location to another and remember, data movement is only done very efficiently on post DG data,

25:12

and they it also does checks on. Do I need to copy this? Because it might already exist on the target. So low bandwidth style scenarios, Um, and this lets us do cloud migration. So Cloud Blocks store is fantastic because you have a common operating environment between cloud block store and Flash Ray. And what what you can do. There is essentially just snapshot database.

25:32

Migrate over for whatever use case and use, bring it up and use it on the targets. This is not only limited to cloud to cloud migrations. If you have a system copy process for your oracle databases, your SQL databases, etcetera. Snapshots are absolutely fantastic for that because they're infinitely more efficient because you've decoupled space from time.

25:52

100 terabyte database takes as fast as you can Click with a snapshot, whereas if you have to run a backup and come back, that's well, data movement. Um uh, computation. Flash Blade also comes with a number of data services as well to ensure we're talking about all of the products because there's certain databases that work really,

26:11

really well on Flash Blade. And there's also certain use cases that work really, really well on flash Blade. So we've got snapshot replication that snapshots of file, um, of file shares, and we've got object based replication for either higher resiliency or the ability to have something available at another location. I actually had an interesting use case where I backed.

26:33

I was constantly backing up to a flash blade object store, and I had replication going to an AWS S3 bucket. And then what I said to myself was, Let's take that back up without changing anything at the storage level. Um, on the AWS side, let's restore it to whatever is native to there. And it worked.

26:50

It was fantastic. Um, so you've got a number of different ways to architecturally implement data movement within that and in Port works, One of the really, really fantastic new age of computing scenarios also has a lot of data management replication technologies within it that allows us to move between different clouds and different environments.

27:13

Assuming Containerization is present in all of those alright, we've come to the end of all of my lovely presentation. I thoroughly hope you enjoyed my, uh, very monotone voice.

FlashArray//XL
Pure//Accelerate
FlashArray//X
FlashArray//C

Database administrators and data architects encounter several difficulties as they administer systems with different requirements and behavioral patterns. This session is focused on showcasing how Pure Storage is well-suited as a partner for database management systems and helping organizations scale business applications faster.

Your Browser Is No Longer Supported!

Older browsers often represent security risks. In order to deliver the best possible experience when using our site, please update to any of these latest browsers.

Continue Watching

We hope you found this preview valuable. To continue watching this video please provide your information below.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.