Episode 3

full
Published on:

28th Feb 2024

Is DevOps Bullshit? An Interview With Cory O’Daniel, Author Of The “DevOps Is Bullshit” Blog Post

DevOps was conceived as a way to bring developers and operations together, but has it really worked? In this episode, Dave Williams sits down with Cory O’Daniel to discuss his controversial blog post “DevOps Is Bullshit.” Find out why Cory sees platform engineering as a solution. Discover its potential as an antidote to the unmet promises and inherent challenges within the current DevOps landscape. Tune in now!

Love the show? Subscribe, rate, review, & share!

Guest: Cory O'Daniel, CEO at Massdriver

Links to interesting things from this episode:

Transcript
Intro:

You're listening to the Platform Engineering Podcast, your expert guide to the fascinating world of platform engineering. Each episode brings you in-depth interviews with industry experts and professionals who break down the intricacies of platform architecture, cloud operations and DevOps practices. From tool reviews to valuable lessons from real world projects to insights about the best approaches and strategies, you can count on this show to provide you with expert knowledge that will truly elevate your own journey in the world of platform engineering.

Dave:

Hi, I'm Dave Williams, CTO of MassdDriver, and I'm here with Cory O'Daniel, CEO of Massdriver. Thanks for taking the time, I really appreciate it.

Cory:

Yeah, thanks for having me.

Dave:

So I want to talk about your article in the MassDriver blog. You criticize the current state of DevOps. You claim it's an unholy beast of division and tunnel vision. Can you elaborate on your personal experience that led you...

Cory:

Yeah, I think those are two very different thoughts that I put together because they sound good together, but I'll start with the unholy division, right? So like the whole initial concept of DevOps was always to bring together developer, developers and operations in this culture. And what's funny is like, you can hop on Hacker News and you'll find people talking about how, “Hey, we did that well.”

But the reality is, is most companies haven't, right? And we have this thing that we say time and time again, where it's like, “Oh, tools won't solve the problem. It's a cultural problem. We have to fix it.” It's like, well, we've been at this for 13 years and most companies still haven't figured out DevOps. And so much of the fact that like we have DevOps teams now, like we've really just rebranded these operations engineers.

And I feel like you do have this division and these are people that can get along well around the water cooler. But when my product manager wants me to finish a feature this week and I've ended up in this weird world of ticket ops where I have to ask somebody else to do some work for me so that I can do my job. Like that's where the division really is.

And you still see this in a lot of companies and you see the animosity building up between the teams that I think kind of makes that cultural DevOps movement even harder to achieve because the operations teams are so understaffed.

Dave:

Yeah. I mean, that's an interesting point. And it's funny when you say DevOps teams, like I don't even know that the idea of a DevOps team was ever in that like original idea, right?

It was, it was kind of like with tooling, one person can be both.

Cory:

Yeah. It really felt like the initial pitch. And I feel like over the years, I feel like there's been a little bit of revisionist history as to what it originally was, but made sense at a very specific point in time.

right? Like if you rewind to:

It was possible. We didn't have the weight of compliance on us. We had very simple machines that we ran very small applications on, but in a world of a global audience, people demanding really high uptime, even for non-critical services, it becomes really difficult for an engineer looking at the 200 services of AWS to handle all that, keep that in their head.

Also trying to make sure that their product manager is happy and you're delivering features that the business needs.

Dave:

Yeah. It's interesting you talk about that complexity. Like what role do you see platform engineering play in kind of mending that loop or solving this problem?

Cory:

Yeah. So, I mean, this is probably another like mildly hot opinion, but one of the things that I've always felt was problematic in software development as a whole is that you do have software developers that have a specialty. You'll meet that person who's worked at e-commerce company after e-commerce company.

And I think that's great. And when we start to think about platform engineering, it's very similar. There's a domain of knowledge that needs to be understood there for that person to do their job well, and that's the cloud, right?

So I think one of the really important things for a platform, somebody who's interested in getting into platform engineering is having that deep knowledge of infrastructure, cloud services, and also application development and treating that domain of expertise as your product.

Dave:

The question I have is like, how hard is that going to be to find in the market? What are some of the challenges that companies are going to face as they kind of take that on?

Cory:

It's hard to find. It's definitely hard to find. And I think that's why you're seeing a proliferation of different services, us included, that help people get there quicker by offering this buy option.

And one of the difficulties of getting there is if you look at the Stack Overflow survey over the past three years, the number of people with cloud operations experience relative to the number of software developers has gone down. Three years ago it was 10%. You could hit that one in 10.

One operations person for every 10 engineers. This year it's 6%. It's not that we're losing those people.

I mean, sure, some are probably retiring. We're just making engineers faster today. We're not just getting engineers out of universities.

We're getting them out of bootcamps and there's great engineers coming out of there. But we have more people joining the industry. We have more companies that are becoming software companies, traditional old companies that are now moving their services to the cloud.

We're getting a lot more people, a lot more companies on the cloud, but cloud operations is still very much an on the job training, right? And so the hard part is getting that expertise. And then when you look at that operations engineers, there's a good subset of those people that aren't software developers, right?

They've got great knowledge of how networks, Kubernetes works. They've worked in data centers, but they might not be software engineers. So finding that right talent is definitely going to be hard because a lot of it is in Fang, right?

And so finding the right engineers that can build this platform is going to be difficult. And so I think one of the things we need to do as an industry is be a bit less opaque about operations. We're not going to get there unless we get these people.

So getting engineers that you're bringing on early, working with operations engineers so they can start to share some of this knowledge of how these systems work rather than having a team of, let's say three ops people that kind of have all this knowledge and everybody else is just asking for services through tickets. Like we got to share that information and get some of this on the job training a bit earlier. And I don't think we're going to see it come out of schools, right?

It's very vendor specific. I can't teach you AWS at a university. And then you graduate four years later and all the services have changed.

Dave:

Expanding on that, it's going to be a challenge to get to platform engineering. I think over the last decade or so, everybody's kind of bought into DevOps and they're, they're not exactly there either. What's the core difference?

Like if you're kind of halfway into DevOps, like how do you get to platform engineering? Why would you even want to go to platform engineering? Like what are the advantages?

Cory:

The number one thing for you as an operations engineer is that burnout. We do have a higher burnout rate than software engineers. Our turnover is much faster, right?

And so I think some of the key things there is, are you working in a company you want to be at long-term, right? Do you love the product? And if so, I think you need to communicate to your team how you can start to add value as an operations engineer.

You don't see a lot of product, and I say this in the article, you don't see a lot of product managers running around high-fiving people because they built a fast autoscaler. Like a lot of the work that we do as operations engineers isn't visible until something's broken. And so we need to be able to communicate how we can add value to the business.

And I think numbers rule, we got to back it out to cash. So how much time are your engineers spending fiddling with this? How backlogged is the average engineer waiting for your smaller team of operations folk to get them the things that they need, right?

How comfortable are we in compliance? How far are we behind on our Vanta dashboards, right? Gathering those things and say, if we start to take some time to address some of the debt that we have and consolidate some of these systems that we've built, we can start to move our way towards platform engineering.

It's not going to be something you do tomorrow. It's something you have to build on over time, but you've got to start dedicating that operations team to building and figuring out how to build a product, not just solving tickets for people. That's going to require software engineers on that team.

It's going to require possibly a product manager, right? Or at least somebody acting in that role.

Dave:

So it sounds like speed is kind of at the heart of like the advantage you get from switching to this like platform engineering mindset, right? Like moving from kind of the ad hoc scripting and automation of things to like some kind of a self-service. Can you share some insights on how organizations can like practically begin to move into that?

Cory:

Yeah. I think one of the key things is, and I think a lot of organizations are unfortunately missing this today, is starting to lean into ISE very earlier. It's going to be impossible to get to platform engineering with some scripts that set up a cluster or doing click ops, right?

We need to be able to create declarative definitions of our infrastructure, whether that's Terraform, Pulumi, et cetera, and start to think about the use cases of this business, right? So how many teams do I have? How many different clusters am I running?

What are the similarities between these teams and what are things that are important for us to start to codify that they shouldn't have to worry about, right? If you're an organization has to worry about CCPA or HIPAA or GDPR, getting a lot of that baked into those infrastructure bundles where they can. I think that is the first step, like starting to codify those things that are painful for engineers and they're frequently requested of operations engineers.

Dave:

In the Microsoft for Startups blog, you touch upon the insatiable hunger for software and the kind of like rapidly expanding software engineering workforce. How does platform engineering kind of change the trajectory of those software engineers?

Cory:

I think the funny thing with platform engineering, right? So like my argument in that article, the first article of that series was that DevOps is bullshit. And I think that where we are today, it most definitely is.

I think the original idea made sense. I think that if we can execute platform engineering well, DevOps ceases to be bullshit, right? The idea of you build it, you run it is where we want to get to, right?

Our application developers they understand the services they need. They understand how Kafka works. They understand Postgres.

They understand their applications and their business domains. That is build it and run it. They should be able to get that Kafka instance or some SNSS, QSQs, EventBridge, whatever it is.

If they can get that quickly and focus on their business, they are doing DevOps. We just happen to have this platform internally, right? And so I think that is one of the keys there.

And so I think as those engineers start to actually be practicing DevOps, what I think changes about their careers as the engineers on the platform is they get to start to hone their craft more. They get to focus more on the business and adding value to that business rather than kind of having these interrupts all day long where they're trying to figure out how to do a DevOps task or change a config or fix a state file in a Terraform pipeline. Things that add zero value to the business, but are necessary.

They'll be able to focus on only the things that are necessary.

Dave:

That's a really interesting point. I think the context switching of kind of the DevOps mindset too, especially in the single developer who's doing both, right? Switching from Terraform back to your app, back to Terraform, back to your app, you know, to your observability software and so on and so forth.

And I think like, is there a world where like the same developer is working on the platform and the product at the same time? Or are these like two separate tracks, do you think?

Cory:

I feel like they've probably got to be two separate teams or at least divided in effort, right? I mean, if you're trying to deliver business value, you have to be focusing on the product, the product that the business is selling, not the product that you're selling to your own engineers. And so I think that clear delineation is important.

And I think that will allow your developers that are building product to focus on that without kind of blurring the lines. I think one of the things that's really hard about the way that we've done DevOps is we don't have, you know, as software developers, we want abstractions. We want these nice interfaces between things.

But in DevOps, it's very easy to get the line blurry of like, am I an infrastructure or am I an application? It's like, you obviously know because your code, but you'll see people kind of shift infrastructure concerns into their applications. And if you can create these interfaces where these things connect together and you don't have the ability to kind of blur those lines, I think that's important.

And I think you'll see the same thing in business. Like you don't want to have your e-commerce concerns starting to blur into your platform, right? You want it to be solely about delivering software quickly and a secure compliant way with the minimal overhead possible to your engineers.

And so I think that clear delineation is very important.

Dave:

That sounds right. I mean, it brings up kind of like an interesting thing where it's like, so you're creating, I mean, I think the big difference between platform engineering and DevOps is like that product kind of mindset. Like in your experience, have you encountered issues with like the prioritization problem when you're not thinking in that product mindset?

So like in DevOps, it's kind of like a list of like security tasks that are all kind of high priority versus like, I know the needs of my customer and I need to meet those needs.

Cory:

When dealing with the priorities, particularly in a platform, I think there's obviously a lot of ways for a team to prioritize their work, but I think most importantly is trying to figure out that biggest component that can shift the most amount of complexity out of the hands of your developers. Like trying to find that core system or core pattern that exists across your two, three, four teams, get that codified and out of the way as quickly as possible. Now, the catch there is you also have to balance that with security and compliance, right?

And that's one of the things that I think is in a lot of companies is not getting enough attention as it should. To be able to get those security and compliance into your platform earlier, I think is also really important. So I think trying to figure out how to prioritize what's going to be more efficient for your developers in their day to day versus what security and compliance things have we cut corners on because either we didn't know or we didn't have time or it didn't matter, like getting those into the platform early because those security and compliance constraints can affect the way that you're doing deployments down the line and it can have a pretty big impact. So I think those are important cornerstones of really good platform engineering that people should prioritize alongside the works that saves engineers the most amount of time.

Dave:

Yeah, it's crazy. It's almost like the most shift left you could ever be on security and compliance is like in the product process where you're kind of weighing that against like time saved doing something versus the amount of money lost if we have a breach.

Cory:

Yeah. And I mean, it's funny, it's like, I feel like we're always saying shift left and we're shifting a lot of the accountability left, but not that expertise. Right.

And it's like even looking at the number of people that have deep security experience, again, going back to the stack overflow survey, it's something around like three to 6% of engineers have deep security experience. Right. So let's say you get somebody on your team who has really deep security experience and you've got three other teams that are delivering software.

What's the most effective way of getting that person in and increasing the security of your business? Is it giving it to your e-commerce API team or is it taking that person over to your ML team or is it getting that person plunked into your platform engineering team where there can kind of be act as a force magnifier for, sorry, a force multiplier for security, being able to implement those security details in the platform and deliver pipelines for maybe doing security scanning as software is going out rather than just having them work with a specific team that you think the biggest problem is on.

Dave:

I think that's like an awesome kind of vision of the future. And I think like I want to wrap up with kind of like your general thoughts about like what the world looks like when we actually succeed in making this shift from DevOps to platform engineering.

Cory:

We don't have enough operations and security engineers on the planet to operate and secure all the software that we're putting in the cloud today. And there's a lot of teams that we've even met and trying to do sales that mass driver that say security is not a revenue driver. I mean security is not a revenue driver for anybody but a security company.

Right. And so how do we get more stable, secure, compliant systems? Like that's the future that I'm most concerned with.

Everything is moving online. Right. And I think that platform engineering is the way that we get there.

Operations and security are very much on the job training. You can do a little bit of security training at the university but they're not telling you and teaching you how to do like deep penetration testing. You can do a little bit of operations at university maybe learning about Docker but they're not going to teach you about 400 different cloud services and how to weave them all together.

Right. We don't have enough of these people. And if we can magnify the experience of these people through platform whether that's a platform that you buy or it's a platform that you build internally or whether it's a platform that's open source through something like CNCF I think that is the future.

More software, faster, more secure software, more compliant software and engineers that are actually happy in their jobs like solving business problems and not dealing with the tedious day to day of what we now consider DevOps. A world where software eats the world and doesn't shit all over it. I love it.

Yeah. That would be fantastic but currently we're shitting a lot.

Dave:

Cory, I really appreciate you taking the time. It was really great to talk to you in here.

Cory:

Yeah. Thanks for having me and remember DevOps is still bullshit.

Outro:

Thank you for listening to this episode of the Platform Engineering Podcast. Have a topic you would love to learn more about? Let us know at coryatmassdriver.cloud. That's C-O-R-Y at M-A-S-S-D-R-I-V-E-R dot cloud. Catch you on the next one.

Show artwork for Platform Engineering Podcast

About the Podcast

Platform Engineering Podcast
The Platform Engineering Podcast is a show about the real work of building and running internal platforms — hosted by Cory O’Daniel, longtime infrastructure and software engineer, and CEO/cofounder of Massdriver.

Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”

Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.

Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.