Episode 45

full
Published on:

18th Mar 2026

Infrastructure as Code's Hidden Problem with Pavlo Baron

Terraform drift, state wrangling, and a growing “tools for tools” stack are still daily work for many platform teams - despite a decade of DevOps talk and cloud maturity. Why does ops automation so often feel like it needs babysitting?

Pavlo Baron breaks down where Infrastructure as Code tends to break down in real organizations: manual drift management, low-level state complexity, and a lack of practical abstractions that let developers self-serve without inheriting the entire ops burden.

The conversation digs into what a more use-case-driven approach could look like - where teams can choose when to enforce desired state, when to accept emergency changes, and how to build “guardrails” that reduce mistakes without slowing delivery.

Pavlo also explains why type safety and constrained interfaces matter (especially as AI starts generating more code and infrastructure changes), and why the future of platform engineering depends less on slogans and more on systems that reduce toil.

Guest: Pavlo Baron, Co-Founder and CEO of Platform Engineering Labs

Pavlo Baron is Co-Founder and CEO of Platform Engineering Labs, who are crafting tools to remove the toil from the operations work, with a current focus on infrastructure. He is a veteran in the space, having served in all kinds of roles throughout his career that spans more than 35 years. Previously, he was co-founder, CTO, and major inventor at an observability startup, Instana, that was acquired by IBM in 2020. Pavlo is a frequent conference speaker and author of several books.

Pavlo Baron, X

https://pavlobaron.medium.com/

https://github.com/platform-engineering-labs

https://www.linkedin.com/company/platform-engineering-labs

https://x.com/plateng_labs

https://bsky.app/profile/platform.engineering

https://mastodon.social/@plateng_labs

https://www.youtube.com/@plateng-labs

Links to interesting things from this episode:

  1. The Pkl Primer
  2. formae
  3. formae quick start
  4. "10+ Deploys Per Day: Dev and Ops Cooperation at Flickr"
  5. “Where everyone is responsible, no one is really responsible.” Albert Bandura
  6. JPL “Visions of the Future”
  7. “Fallout: New Vegas”
Transcript
Cory:

Welcome back to the Platform Engineering Podcast, I'm your host, Cory O'Daniel, and today I have Pavlo Baron, co founder and CEO of Platform Engineering Labs, the company behind formae, a new open source Infrastructure as Code tool that we're definitely going to spend some time talking about.

But Pavlo has been in the game for decades. You can see it from the sweet beard. You can see it from the sweet beard, dude. That's decades of experience right there. Like, that's the resume. Just the beard. I ain't got the resume right now.

But he's been in the space forever. He recently was building Instana, sold to IBM. Yeah. And we want to talk about all things AI, IAC adoption. Very excited to have you on the show.

So Pablo and I met at KubeCon. I was just perusing the booths and I saw a very cool booth background. That was actually what brought me over there. I was like, "Oh, this booth background's awesome. Who designed this?" And then I was like, "Wait a second, I recognize your name from somewhere." And we had already had the podcast scheduled, but it was nice to meet you in real life. Welcome to the show.

Pavlo:

Thanks for having me.

Cory:

So, for everybody who doesn't know we've rescheduled this three times. Fate does not want this conversation to happen, but I'm glad you made it for, I think, the fourth attempt at recording. And hopefully a tree doesn't fall on my house or anything weird.

So, Pavlo, can you just tell us a bit about your background? What led you to Instana, what the IBM experience was like, and how you decided to start moving towards building a new IaC tool?

Pavlo:

Oh, yeah. Yes, okay. Thank you for calling me old.

Cory:

Sorry, I have beard jealousy, dude. I can't grow beard on one side of my face. So whenever I see a good one, it's just like. It just gets me.

Pavlo:

Yeah, I understand that. It's great on this end.

Well, I mean, I'm in the business for many, many years, as you already pointed out. And I grew up coding. I mean, I started coding when I was like, 14. It was all assembly back in the day and then at some point, I moved on to C, writing assembly code, and then I moved on to other stuff, still writing assembly code. So this is all in the past.

I was with all kinds of companies because I'm naturally curious and I don't want to stick with one single business domain for too long. That's why I kept hopping... well, not really hopping, but staying for a few years here and there, going into insurances, you know, all this fintech - before it was called fintech - banks, retail, all these kind of things just because I wanted to know how it works.

Then naturally, with a growing experience, you get more visibility and more responsibility. They start calling you more like a lead, or then they start calling you architect, et cetera, et cetera. I'm naturally curious, as I said, this means that I actually never wanted not to do things. Like the worst thing for me to happen is to be an ivory tower architect and not... you know, just to do stuff with my brain.

So I actually grew up coding. I never gave up coding. And that's just. For me, there's no difference between, well, developing and then running stuff. I mean, I'm one of those who are actually willingly going on calls. Wherever I go, I always was very close to the Ops. We're helping them with doing their stuff - even not being part of them - because that's...

I think this belongs to an overall profile. But maybe I'm just weird enough to do that.

Well, how did Instana happen? I was with a consulting company back then. I was traveling everywhere, also traveling the world a lot, being at many conferences, giving talks at QCons, and was at some point involved with the Erlang community - that I really like, but I don't have enough time anymore. Well, and I needed to evolve.

So I was coding in the hotel room at night. I was very curious about all this big data stuff. I wrote books about stuff like that, like big data, Erlang, etc. Authored four books, have to admit, very far back. Yeah, I still get paid for them, but...

Cory:

Nice.

Pavlo:

That's a different part of life.

At some point I started digging deep into how to actually process telemetry data on streams. And that was just a hobby, like figuring out how all this stuff works, what technology do you need. Because back then I was a massive supporter of the NoSQL movement. So I dug into all kinds of alternative technologies, data stores, ideas, distributed systems, all of these things that really satisfied my curiosity.

And then I started building something that the CEO of the company then saw at a conference and said, "Oh yeah, we are APM experts by the way. How about we apply that onto APM because it feels like that's the next step in the evolution?" And then, well, we firmed out and started Instana, got some initial funding and I hired a few people and developed this over a course of not even six years, I think. Well, plus the time that I was hacking alone in the hotel room.

Yeah, we made it into a company with up to like... I don't know, I don't remember really... 200 people maybe. And then, well, IBM asked us to join because they liked the technology, they liked the vision, they liked the IP. And yeah, we decided to exit.

And then I stayed with IBM because, well, I was in a strategic space. The automation space at IBM is very diverse and complex. As I said, I'm naturally curious. I'm not sticking with names or whatsoever. The domain was exciting. I learned a lot. I met a few people whom I actually really admired. I stuck my nose into things that I would never have access to otherwise, like real hardware and stuff. That's, wow, exciting.

Well, at some point I just said, "Hey, it's time to start something different."

Cory:

Yeah.

Pavlo:

This whole AI space started... well, it was not really hyped yet. OpenAI did not publish ChatGPT back then, but something was in the air. It was visible.

So I started working and exploring and figuring out some tech. And then they basically killed all of it at once with ChatGPT. Like, literally all of it. All of it. But when I saw the news, I was like, "Fuck."

Cory:

Yeah, it was just like Watson was just kicking ass on, like, Jeopardy, and then all of a sudden, ChatGPT was in everybody's hands.

Pavlo:

Right, right, right, right, right. I was like, "Man, that's a... How is this even possible?"

I really wanted to go after knowledge workers and help them, you know, because I was seeing that people are wasting time Googling, and all of it is so freaking, you know, complicated and slow and really manual. So there must be a lot of automation that you can get where you can get rid of stuff you don't want to do.

That was what I was working on.

Cory:

Where did the idea for formae originate? So first tell us a little bit about formae and then tell us where the idea came from.

Pavlo:

Right. Yeah. You can cut together half of what I just said.

So formae, that's an interesting story by itself. So my co-founder, Zachary Schneider, he was my first hire in the US when I was Instana CTO. I basically was doing pretty much all of the Ops and at some point I needed sleep. So I was looking through my old contacts and I said, okay, who is... because we started doing international business and you can imagine, this is a thing that was running twenty-four seven, like, it doesn't matter what time zone on. And then suddenly you have American users while you are still having the team in Europe. I live in Munich, Germany, by the way.

I was looking at the map and said, "Okay, the middle of the country seems to be Austin, Texas. It seems to be a tech hub." I was talking to everybody I knew and then I found a guy who knew a guy who was just... they exited with Boundary to BMC and he was not happy with BMC at this moment already. So I chatted him up and he said, "Oh yeah, I can take this over, I'll do the Ops side." Yeah, we ended up hiring him and he stayed with us almost up until the end.

I mean, we're really friends and we kept talking all the time. I mean, multiple petabyte in a short period of time ingress that was installed, that Instana was processing, that is quite a challenge. It's technically definitely not easy. So we had to make choices that we throw overboard, then made different choices, et cetera. We kept rolling and adjusting, et cetera, et cetera. Now through all of that time, we kept discussing.

By the way, when we went apart and he went into a few startups himself, he ultimately was part of the core iCloud team at Apple. His last station before we started this company here was Apple.

Now we kept talking about, "Why is this that we have to work with low level tools that don't help us?" Like literally everything in the Dev space is so shiny and innovative and makes unnecessary work magically disappear, go away. Like optimization over optimization and everything in the Ops space is just so freaking low-level that you have to basically build Rube Goldberg machines over and over to solve a very simple problem.

So we were talking, we were like, "How can we change that? Like, what do we need to do to make it go away? We're tired. We don't want to do any of that."

I mean, everybody in this field you talk with, when you really get closer to them, they will tell you, "Man, I want to check it all in and go garden. I'm fed up, I'm tired. This is just too much."

On one side, there's Ops pressure... we will talk about that, I guess... but then on the other side, the tools are just like basic. Right? Well, and then at some point we've heard that IBM acquired Hashicorp.

Cory:

Big news. It's breaking news, everybody, if you haven't heard it. Sorry.

Pavlo:

Well, it's... I mean, yeah, we already are more than a year in.

Cory:

Yeah, yeah.

Pavlo:

But then I called him and said, "Hey, Zach, I mean, you know that when an acquisition like that happens, the market opens for new opportunities and new players who actually want to change something. And we're not talking about incremental changes. We're talking about fundamentally redefining how that works. That was cool a decade ago, but actually cannot keep up with the, you know, with the rate of change that is happening right now with all this diversity, multi clouds, hundred thousands of technologies. It's completely different from the time where they started working on Terraform."

So yeah, that's how we started formae.

Cory:

This is an area I'm very interested in, obviously. I think we all are, right? When you're saying, like, how it really feels like Ops never really got the shiniest of tools. I think our shiniest tool is Kubernetes, right? And it's a lovely tool, but it is not shiny.

But it is funny, right? Because our jobs are just so fundamentally different than software developers. And people outside of our space, they see the keyboard and they're like, "Ah, you all just write software and deal with computers." But like, our job is so like, "hey, handle arbitrary requests for things", people go and create till kingdom come, and when things break sometime in the future, that's us again. We're just this sandwich... like what's inside of it? Who knows?

You know, it's like, lay down the bread, some software developers do some stuff for a while, something breaks and it's, I guess, our problem. Should be theirs, but, you know, it depends on your DevOps maturity model, right? So like, the job's just so different and it does feel hard to get good tooling in the space.

And so, you know, as you're thinking about just the gamut... because I feel like there's just so many places in the Ops, DevOps and Platform world where you can go in and make the tooling better.

What was it... where you like, "It's the IaC, It's Terraform?" Because I agree, like Terraform, all IaC has not overwhelmed people with successful adoption stories, right? We're still stuck, like low 30ish percent IaC adoption.

And it's crazy, everybody listening to the podcast is like, "Wait a second, we do it at work." It's like, yeah, you might use the tool, but have you fully adopted it? Is your organization running on it? Or do you have a project on it, or some projects, or some percent of your infrastructure?

Most organizations are not as far ahead as the organizations that have platform engineering teams. There's still so many teams that just have an Ops team and they're still in a data center where they've built some stuff. And it's just like getting to the cloud and getting to IaC is just not even on their agenda.

So now everything has been just rushing to the cloud for a decade and now, seemingly faster with AI, this becomes even more important. Right? Especially when we start talking about how do we arbitrate what me and the AI are even talking about. Are we even talking about the same thing? I need a network. What does that mean to AI?

And so I think it's really important that we nail IaC. But what was it about the IaC tool in particular that you're like, "It's Terraform. It's the way that we think about Infrastructure as Code and provisioning and reconciliation, et cetera. That's the problem. And these are the main areas that formae is going to attack to fix it"?

Pavlo:

First of all, the reason why we called the company, not formae, but Platform Engineering Labs is actually already showing our broader vision. Platform engineering from my point of view is... you can call it whatever, it doesn't matter... the point is that after a whole decade almost of DevOps cultural discussions, I personally think that it's a tooling problem we're fighting. We're not fighting misalignment between people. We can't call them lazy just because they have to slow down in order to do their job.

I mean, that cultural thing, I don't buy it. Been there, saw a lot. But the thing is, when you look at how Terraform was created, it followed the same standard of kind of... it feels like it is a UNIX tool from the early seventies. That's the thing. Everything happens on your machine. It doesn't have a backhand. You have to babysit that. You have to exactly tell this thing what to do.

And this whole that is happening with AI is actually already going completely in a different direction - you don't need to babysit anything anymore. You need to establish guardrails. Depending on your job, you're letting it go more and more. But also completely without AI, there's no point in inventing a term called drift if you can solve it. Why would I always be dogmatic and say, "Oh, my infrastructure drifted because my code is not the same as my infrastructure."

We think it's completely wrong because it needs to be use case oriented. And that's what the early UNIX tools actually did not do. I mean, okay, I'm a big fan, but there's a limit, there's a line. I think that right now we need to think everything in use cases, because that's how actually successful solutions win. If they understand what the user actually needs. It doesn't matter that you have 350,000 command line flags, you need to nail that one.

Speaking of use cases, the use case clearly is sometimes it is okay to roll the whole thing, sometimes it is okay to enforce from your code into reality, but sometimes it's not. And sometimes you just want a little tiny patch and no matter how hard you try to avoid ClickOps, it's going to happen.

Now who are you to decide that what has been done at night to fix a problem with duct tape is not a valid change? Probably it is. So what people kept doing now is manually catching up on this freaking drift all the time, maintaining this illusion of ownership of the reality, while the very same tool does not support you because it doesn't have anything active running that can catch up on any changes.

So that was the first thing that we said, we need to change that. This whole manual state management, all of this stuff that you keep yourself busy with, that doesn't matter at all.

I mean, how can you even call that a Infrastructure as Code solution if you end up working in the freaking state JSON all the time instead of working in your HCL? How is this even possible? It's not code. That's a technical data exchange format, right?

So long story short, we said, "Yeah, we'll go tackle this one." Because we think that the Infrastructure as Code approach is absolutely important, but we see that, you know, when you try to sell it like Pulumi does with this whole let's go after developers and give them the full blown Python typescript... that's not how I know Ops either. Ops people don't code. I mean, they don't really do full blown programming. The script is probably the maximum they want to go with. So yeah, we saw the whole landscape and we said, "Okay, the last word in IaC is not spoken." Are we the last word? We don't know, but we need to change a few things fundamentally to just redefine it for now (for twenty twenty-five back then) and the future.

Cory:

I agree with something you said and I feel like it's one of these things that people get so testy on when you talk about it on the LinkedIn. But when you go back to that OG deploying 10 times a day Flickr video... the one that everybody hails, like, this was the beginning of DevOps, this conversation. It was great, it's a great talk, we'll link it in the show notes if you haven't seen it, very old, like two thousand six or something, maybe even before that, two thousand three, something like that... but it's funny, when they're going through this and they're talking about, like, hey, we got to do this, like, this DevOps thing that's going to get coined, the first thing they talk about is the tooling. It's the first thing they talk about. They don't talk about the culture and this and that.

They talk about, "Hey, software engineers, there's this crazy thing called Git, and there's this crazy thing called CI/CD." And people are like, "What are you talking about?" You know, in two thousand three, we didn't know any of this stuff. Two thousand three, it was just shell scripts for names, right? And they show up with these ideas of technology that are, like, growing at this point in time. And, like, this stuff is key to introducing change effectively, efficiently, and making sure it's not broken.

That was the idea, was we can actually take some tooling and treat it as the API between teams - starting to sound a little like platform engineering - and let teams have access to stuff. And what's funny is that was such a grand vision.

It's just like, "Hey, we should use some technology to create a barrier between these teams so people can get what they need and do what they need without breaking and stuff." And then we went. And we're like, "Well, it's all... it's actually... it's cultural. That's why your company is not succeeding it's because you're not culturing right." And the reality is you can't walk into a boardroom with a banner and be like, "Guys, we did it. We're Ron Paul DevOps emojied, we're DevOps now."

That whole pitch of like, "Well, you guys just aren't doing the culture part right." It's like, yeah, or maybe my company's not spending the time and effort to buy some tools and let us experiment with tools. And you might say, like, "Ah, that's the cultural bit. You're not doing it." It's like, yeah, but the cultural bit there is getting to the tools that lets you figure out how to embrace the culture.

And so I've always just thought that that was such a wild... "Well, you got to start with the culture." It's like, "No, man, you got to get some shit in there that works and create some bandwidth for yourself. That's what you gotta to do."

Pavlo:

Yep.

Cory:

The culture does not fucking matter until like a year out. Like, you see some progress, you're like, "Hey, how do we get people excited about the bandwidth that we've created here?"

Now you can go make yourself a culture, but until then, if you just sit in culture land all day long, you're not going to get anywhere. And like, that was the wild part, is like, our tooling hasn't changed much, right?

And like, I'm seeing conversations now on LinkedIn and Reddit where people are like, hey, like, you know, whether or not you buy into the AI generation, whoever's listening, like, there are people that are generating just tens of thousands of lines of code a day and they're reaching the limits of CI/CD, they're reaching the limits of Git, and people are starting to think, like, are there other tools that we should be considering instead of these things that we've just been using for 20 years? Right?

And so the tooling landscape has been so slow in our space and so far behind, and we're finally at a point where I feel like the world's changing enough, where we might be able to get some breathing room for Ops to consider their tooling, considering how this goes.

Looking back at old tools, we're taking decisions and debt and problems from 20 years ago, and we're saying, "Things still work the same" and it's like they don't. We had servers then, we went to VMs, we went to Cloud, we went to Docker, we went to Serverless, and now we're just horking stuff into LLMs. Like, the world changes. And our tooling has been pretty much band aids that we've patched around for the last 20 years as Ops folk.

Host read ad:

Ops teams, you're probably used to doing all the heavy lifting when it comes to infrastructure as code wrangling root modules, CI/CD scripts and Terraform, just to keep things moving along. What if your developers could just diagram what they want and you still got all the control and visibility you need?

That's exactly what Massdriver does. Ops teams upload your trusted infrastructure as code modules to our registry.Your developers, they don't have to touch Terraform, build root modules, or even copy a single line of CI/CD scripts. They just diagram their cloud infrastructure. Massdriver pulls the modules and deploys exactly what's on their canvas. The result?

It's still managed as code, but with complete audit trails, rollbacks, preview environments and cost controls. You'll see exactly who's using what, where and what resources they're producing, all without the chaos. Stop doing twice the work.

Start making Infrastructure as Code simpler with Massdriver. Learn more at Massdriver.cloud.

Cory:

As you're working on formae. Like let's say you meet a team and they started adopting Terraform. Probably plenty of the audience here today either successfully have done it or they're in the progress of it, right? And they're running into that thing where they're just like, "God, this is the worst part of Terraform." Like, what is that worst part that formae has come in and said, "Hey, you know what, you don't have to worry about that over here. Like that's gonna go away. Adopting this whole stuff is gonna be just much easier for you." What is that pain?

Pavlo:

Chasing the drift. I mean there's a whole complex of problems in Terraform. How you resolve drift.

It has to do with, you know, constantly catching up on your own state. It's even having to care about where the state is. Okay, well you can go with a paid offering and everything, but the point is you end up knowing too much detail. And the second thing is this thing is you-centric. I mean, some people with a big ego like it, but that's not how teams actually succeed.

When you end up doing all the work, nobody else can support you. This is a problem, right?

Unfortunately, this overall complexity is left to the top notch guys to resolve and to eliminate this drift or to fix the freaking state file. Whatever is going on. But there's too much manual doing on a very low level of things that some people don't even want to understand, some people can't understand. Some people don't care about.

That you-centric thing. I think this is something that really needs to fundamentally change.

The other aspect is, may I say the second thing that really bothers us?

Cory:

Yeah, I want the whole list, baby.

Pavlo:

Yeah, we don't need to go into the whole list because it's going to be too much. But the second thing really is that, you know what I said a few minutes ago with the Rube Goldberg machine? It's not that you're buying Terraform. In order to make basic use cases work, you're buying a whole bunch of tools that you need.

Cory:

You certainly are.

Pavlo:

You already mentioned this initial DevOps discussion that already came up with a few tools that you need. But the tool sprawl is massive.

You want to do the right job, you need to use something for your templating, then you need to understand the storage details so your state doesn't disappear overnight, you need to add all of these fat layers on top of the bones that are so minimalistic that it almost hurts.

So that was also the thinking that Zach and I, we had by the time when we started it. We said, "Hey, how can I reduce this massive amount of tools to a minimum to get the job done?"

Okay, it's going to hurt because you probably need to let go of a few of them. And in some cases we probably, in the early days, we will not be able to replace some of them for that particular use case. This is fine. But even if I manage to get rid of two or three of them, that's already great.

And this is how we ended up also, you know, making the choice for the language that is in formae, which is Apple's Pkl. Which is, you know, good enough to do some programming, but it doesn't feel like you're doing full blown typescript. And on the other hand, it's completely schema safe, right? So you don't make stupid mistakes.

So yeah, this is the number one. I mean there's so many, so many other things on the list, but I would say this drifting and the tools problem is definitely something that need to be addressed.

Cory:

Yeah, the tool sprawl is interesting because, you know, I feel like in business and the markets, like you have this concept of bundling and unbundling, right?

Over the years there's like, "Hey, I got cable and now I've got the option of buying 7,000 channels" and then it all kind of coalesces back into a Netflix, right? And like we've seen this happen over and over and over again.

But it just feels like... and I don't know if it is just how different organizations are, if it is how just different people's multi cloud strategies are or whatever it is... but there is something that I feel like just drives Operations, DevOps and Platform Engineers into the anti-bundling world.

They're just like, "There is no way in hell..." Like everybody wants a pitch like, "Oh, my thing is a single pane of glass." But like people are just like, "I don't want a single pane of glass... I mean I do want a single pane of glass. I just want 8,000 paintings behind it that are all kind of collaged together." And it's just like, "Well, if you want a single pane of glass, you want 8,000 things." Right? And so you see this idea of like the portals and whatnot trying to bring all these tools back together so you can see it all in one place. And it's one of those things where it's just like, "Why don't some of these tools just coalesce? Like, why don't they come together? Like, why don't..."

It drives me nuts. I feel like every resume, every job description I see, it's just a smattering on the Ops side of just like, here's... not just buzzwords... but here's a hundred buzzword technologies and buzzword-like products.

The job is so hard because you can do the exact same job, go to another company and they've picked 45 different tools.

And it's like if I'm a React developer or I'm a Node.js developer and I go from one company to the other and they use Vue here and they use React here. Like, sure, like a framework that I use for rendering changes, but like the event loop doesn't change, building my software doesn't change, package and NPM doesn't change. Like, I've got this ecosystem I can behave and exist and work in and I've got one thing and it's great.

And then you get in OPS and it's just like, "Well, we got YAML over here, we got HCL over here, and then we've got this like weird thing in the middle for like our CI/CD over here, and then we've got the CI/CD system, the GIT system, we got this..." It's just like there's so many moving parts that it just feels like the job.

Pavlo:

Yep.

Cory:

It's just destined to always be gluing shit together. And I know that's like the joke in our industry, but like, the reason it feels that way is because for whatever reason, everything is just so fragmented and unbundled.

Pavlo:

Well, but this is the original idea behind Unix, right? I mean, you make little tools that do one job and then you marry them together. I mean, this is fine. I don't think that this needs to go away, but there's a natural... I mean, well, you can write off a few of these things to the personal ego. We have all egos in the world. I mean, I'm a freaking diva myself.

The point here is that, you know, the operations job... and I'm not excluding platform engineering or, you know, DevOps or SRE... I mean, there is a part of the job that is Ops. I mean, you need to make sure that your shit works. And this is a special job and it has nothing to do with software development per se, although you can use the methods to improve or you know, to sleep better to some extent, but at the end of the fucking day, it's all duct tape.

I mean, that's what it is. Now, the feeling that you constantly need to have as an Ops guy is that you are in the freaking full control. And that's why the single pane of glass does not really resonate well until you can really use this freaking thing as a database. Because you know, in the worst of all cases, you will have to bypass the shiny tool that you're using and solve the problem by hand. If you can't, you're fucked. That is the nature of the job. And that is why people actually go to, well, as low level as possible, because everything that comes on top comes with shiny UIs and all this stuff that was like, "Man, I'm looking at that and it's like, okay, in the worst of all cases when this thing is not available, what do I do?" What is it that I can do to fix this thing if that SaaS solution is not available to me? Or decides to misbehave? Or whatever. Which brings us back to AI that has much more potential of doing stuff like that than anything that is hard coded if you wish.

Yeah, we go with our mentality, with our philosophy as far as saying, "Okay, yeah, it's not necessarily better tooling, but it understands what you need." This is our goal. We want to implement a few things that follow the use case rather than just giving them a freaking roll of duct tape to just do whatever they want.

Because I think that more shit is landing on the desk of the Ops people and that's definitely happening more and more and more and more with all this sprawl of infrastructure that is happening. You kind of need to start thinking more in systems that you kind of combine rather than in low level hacks.

Cory:

Yeah, it's funny too because like thinking about AI, I would love to talk a bit more about like how you're using it, not only how you're using it, but like, like how you're thinking about it, you know, as a co founder and leader of a company.

But it's funny like we know that things work better with good context, right? And like at the end of the day looking at a YAML file, not a ton of context there, right? I love YAML, but we all joke about like how crappy of representation that is. The no problem, right? All that jazz. But it's funny, like just the more constraint and refinement and just concreteness you can give around your prompt, the better the results are, right?

And like, you know, looking at Terraform, we have a type system in Terraform now. One of the things you mentioned was how types work in formae. And I feel like we had this like big swing over like the past twenty years where it was like, you look back in the nineties and it's all about type systems. And then like early two thousands, like Rubies and Nodes and the Pythons and like types were just gone for a while. And software. It was just pure chaos, right? I mean, yeah, you had objects, but like we didn't really have a type system. Y

Now you're seeing languages start to embrace type systems again. We have TypeScript, we have Ruby 3 with Sorbet and all this other stuff. And type systems are coming back because it turns out that they're very helpful in getting our C tags to work, and autocomplete to work, and everything else.

And so in this age where we're going to be working with machines... and I feel like a few weeks ago I wouldn't have said that. I feel like I'm there, I've seen it. Machines, they're going to be writing code with us and we got to figure out how to deal with that. I'm not somebody who's like, "Yeah, I don't have to write code anymore." We have to figure out how to deal with that. This is a huge fucking problem. It's great, but it's a huge fucking problem at the same time.

But one of the things I think is going to be key is embracing of types is important. And looking at our IaC tools, that is something that can stand apart, is if you can be able to give use cases and use types to be able to guide and almost converse and have a way of talking to these machines of like, "Hey, this is what I want from the cloud." And it just gives you a sentence. You're like, "I don't know if that's actually the interpretation that you got. I want a public network. What does that mean? Right? Like, ah, now if you show me the config, I know what that means."

And so to be able to start talking in like types and things that it can understand and not mix up the idea of a string and a string that comes from a set - very different things. Strings that can come from anywhere versus it has to be one of these threes. Right?

And that is just where I see this being very difficult. And I see people struggling with this. It's like, "Oh, well, I'll do this to make myself some YAML." And it's like there's not a ton of structure there. You might have JSON schema to open API or whatever, like behind your kubernetes manifest, but like most YAML doesn't have a schema that is visibly aware. It's just like it's in a ReadMe someplace. And that's just going to be hard.

So like, how does the type system... and you said it's based off of Pkl by Apple... so how does the type system differ from something like a Terraform type system? And I'd also be curious, what's that learning curve like for an Ops person jumping from an HCL or YAML to a Pkl?

Pavlo:

That's a great question. I mean, we are maintaining something called the Pkl Primer. Basically you learn Pkl in 15 minutes and like 16 little steps in the tutorial.

I mean, it can be complicated if you really start doing stuff that we are doing with Pkl, because we're using Pkl a lot for even generating Pkl ourselves, et cetera, et cetera. So we're really stretching the ecosystem massively. But on the upside of things, on the user side of things, you don't need to do any of that.

And what we actually came up with is also, if you don't want to do this, you can define in your Pkl, that we by the way call forma... that's why formae, formae is plural of forma... and forma is a piece of code that you use to talk to us. And this is the piece of Pkl code.

So you can define a few arguments that is your interface to that particular forma. And people who are using it on the other side are not breaking out. They just have this three, four parameters. And it's still not YAML - I mean, you cannot make a stupid mistake there. You have to provide these values no matter what. And these values internally follow constraints, as you said. Like, you know, they come from a set, whatever.

We can have these discussions for hours, but the point is that when you go to AWS and I don't remember what region it was, or multiple regions, but there's like gaps in the availability zones letter.

Cory:

Oh yeah, that's fun. That's a fun one.

Pavlo:

How the fuck am I supposed to know that when I'm using YAML, right? I mean this stuff breaks all the time. Why is this a problem at all? Okay, Amazon made a decision not to have d, but you as a user should not actually be delivered to that, right?

So what I'm saying is that the learning curve as a user is simple because you end up defining objects with properties. You can do some coding... I'm not a big fan of loops and conditionals that have been added later to Terraform. I think they're pain in the ass. But it's a question of taste... but you know, you can do stuff in that language, in Pkl, that still is not the full blown functional programming where you can encode the whole freaking world in one line, but you end up having a native loop and it feels like you are rendering what you're actually doing.

I mean, many people are using Pkl without us. I mean this is a render. Effectively, you make your YAML as a target format without making stupid mistakes. How awesome is that, right?

Cory:

Oh, that's interesting.

Pavlo:

Yeah, that's how it was created and it's being used all over the place.

People who come to us, they keep telling us, "Oh, we're already using Pkl for our YAML stuff, we don't make any stupid mistakes there anymore." So there's more stuff like that. I mean, CNCF has KCL inside of it.

I recently suggested by somebody over caffeine or whatever the name was, there's a different language that is also similar there... The whole idea is that, I mean what we really need to solve for everybody are exactly two things: Don't keep making the same stupid mistakes that you've been making for the last few decades without any improvement and do whatever you can to eliminate the freaking toil of your day by day existence. Because it's already hard enough and you keep adding shit to your own desk, to the pile that is on it, just because your tools don't want to support you properly, because they have never thought about what you actually want to do. What is it that you want to achieve.

Speaking of the freaking drift, what you really want is some kind of a... well, no-brainer system of record if you wish, where ideally everything is represented, no matter where the change camp came from. Because I mean, really, I need to give up every time I hear that somebody is fighting ClickOps. Man, oh my God, you know how much money cloud providers actually spent on their own consoles and everything? We're freaking excluding them completely out of equation, although they sometimes are useful. Anyway, debatable, but the point is there is a use case where you can enforce your idea of infrastructure. There's a use case where you can't. You need options. Options are all about use cases. That's what you want. If you don't have a choice, you will end up piling tools on top of each other that are ultimately a root callback machine.

Cory:

Yeah, it's funny too, like, thinking about the shared responsibility model of Ops. Like, this one's always... I don't know who said the quote, I remember, Gosh, I've heard it, heard it so many years ago. I don't know the exact... I'm gonna botch this quote and I should probably... I'll Google it and put it in the show notes. But it's something along the lines of when everybody's responsible, no one's responsible.

Pavlo:

Oh yeah, that's great. Yeah, yeah, I've heard that.

Cory:

Meanwhile, we have the shared responsibility model.

It's like, okay, but everything seems like... when you think about the developer at the other end of that shared responsibility model, I feel like they're the one that pays the cost of it. I mean, we pay the cost too, because we're like, we have to support it, but like, that's our job. Right?

And what I mean by they pay the cost of it is just what you're talking about with ClickOps, right? Like, they found a way that worked to get the desired change they needed from the cloud.

Pavlo:

Right.

Cory:

And you walk around, you slap them on the hand, you say, "No, you don't do it that way. You go to this Git repo over here, not yours, this other one, and you put some Terraform here." Or whatever. And it's just like, okay, well, so it's not really shared responsibility so much as like you making a decision and putting some of the work off on me. Like, that's what you did. It's not the symbiosis of Dev and Ops, it's not where we're coming together culturally and we're saying, "We're going to solve this problem together." Because the Dev said, "I just want to click this thing." And you said, "No, that's not the way we're going to do it."

Pavlo:

Right.

Cory:

And it's just like that is so fundamentally at odds... I don't think it's hard to learn something like Terraform or HCL or even Pkl... like that's not the developer's problem. It's not learning the language, it's learning the cloud.

These developers program in four or five different languages all day long. Like they have no problem learning a language. It's that no matter what the language is, whether it's Pkl or HCL or whatever, if you just say, "Hey, this is your responsibility, developer," you plop them an EKS cluster, VPC, or hell, let's just say it's Postgres... well half of that API is operational plane, half of it's developer. Shared responsibility model? The shared responsibility is right in the API call and that's what makes it hard. If I'm a developer, I'm like, "I need Postgres 16 for my app." Like that's the extent of my requirements - Docker install, Postgres 16, I'm developing locally. But now for Prod, I have to think through all of the phases of the operational lifecycle of this server and costs and availability. It's just like, well that's a lot for me. The developer's just like, "I need Postgres 16."

The shared responsibility again was me taking on your job to make a bad decision that's going to create more work for you.

Pavlo:

Right.

Cory:

I feel like that's one of the things that like has got to change in this new world of AI. There's going to be a lot more people doing ClickOps. Guess what? AI is going to be doing ClickOps. It's not going to be clicking, it's going to be doing random ass API Ops. You know what I mean?

And we got to figure out how to rectify that. So the problem gets exacerbated in this near future that we have. And you know this is just like, hey, you have to do it the way the Ops team wants.

I think we've gotten to the point where it's like, we need to rethink how we go about packaging up our offerings, whether it's a developer or whether it's an AI that's doing something. If we as Ops want to have that control and not just be inundated with random things to fix, we need to be able to say, "Hey, here is an experience that you can have and you can do whatever you want with it because I've protected it. I've hidden the availability from you. I've hidden the backup schedules and encryption options from you. You just tell it what you want."

Pavlo:

Exactly.

Cory:

Whether you're telling the AI, or whether you're inputting it and doing some ClickOps somewhere, or whether you're filling it out in the Pkl file, like you're not seeing the entire world, you're seeing the small footprint that the organization wants you to see. Non negotiables hidden.

Pavlo:

Yeah, I'm with you there. You asked me for a whole list, I didn't want to give you that, but this is number three, the level of detail and the lack of abstraction that is going on.

This is why I'm hoping that platform engineering is going to take off. Because one core idea is that a few people who are more on the low level detail side of things are preparing abstractions for the ones who don't care.

I'm a developer, if I'm curious enough, I can go into the platform engineering team and do whatever the heck I want. But most of the developers, let's agree, they just need to do their job. The job is to write applications that are doing some business stuff. That is the job. The other job is to prepare services for them.

I can tell you, you gave me this example, I don't even agree that the developer needs to know they need a Postgres 16 because who gives a fuck?

The point is your job as a platform engineering team is to give them a service called database that will be something like T-shirt size, have a T-shirt size parameter, team name... I don't know, like green or whatever... from which they will extract enough information and derive low level detail that they can keep changing behind the scenes. Updating and nobody else sees them. That is the whole point.

That is the beauty of... well, what I think, I mean of course I'm biased, but that is the beauty of what we're building. We allow you to abstract that for the others. Like they can abstract that with Pkl, but there's even no obligation to do so because our CLI effectively turns parameters defined in your forma into command line flags. And you have to provide them when you apply. You have to do it and you need to give them values that are constrained. How freaking awesome is that? Because you don't care about all these levels of details.

But that is number three because that is kind of a problem that needs understanding, right?The other two that I mentioned, they're already there. We already actually hate them.

Cory:

Yeah.

And I'll say, anybody who's listening, and Pavlo's talking about these abstractions and use cases and kind of like limiting the access to what's inside the IaC. If you look at the Terraform Public Registry, you'll see these modules with like fifty, sixty, seventy something inputs and they have defaults... that's great, you can do whatever you want... but what I've really seen with many teams is when you look at something where it's like, "Hey, it's got a bunch of defaults and so it works out of the box," that's not how people sit down and use it. They don't sit down and just go, I executed that thing and I didn't put any inputs in and it worked.

They look at every single input and they're trying to figure out, "Like, what should I put here?" There is decision paralysis even with defaults. And the problem with that is you get to Googling and then all of a sudden you start thinking you should be putting values places, and now you got a problem.

Pavlo:

Well, now you're not Googling it, you're not Googling this anymore - sorry to interrupt you. Now you just throw it into your Claude Code and hope it works. Right?

Cory:

Right. So that's the thing, it's wild.

What I've seen that works very well... and I think that you can adopt this, like, use case or like higher level abstractions fairly easily. And there's a really good baby step. And it's something like... maybe it's just the team name, right? Like, team name is actually a good one. It's like, "Hey, what team owns this?" You can figure out the cost centers from that, instead of making somebody remember all their cost centers, right?

You, as the person that wrote the module, know what class of infrastructure it is. Is it data? Is it compute? And you know the team. You can say, I'm going to go look up the cost center in a database someplace so that people don't have to remember, "I've got a cost center this for data", "I've got a cost center that for QA," and make an input in the right thing.

Dude, that's something no one gives a shit about, except for the CFO, that you put on everybody, right? And it's like, that's one of those things you can just hide.

Pavlo:

Right.

Cory:

Another thing. Like looking at databases, instead of making people think about availability zones and all this shit, it's like, "What's the SLA of your service?" And do it as an Eno - ninety, ninety five, ninety nine plus.

As the person who's making the Postgres module or whatever, it's like I can say, "Hey, you know what? If you have a service that's 90% uptime, I'm gonna throw that bitch in one zone. We're gonna keep it cheap."

And what's crazy is you can start to unlock the self service that goes fast. And what I see people hang on to all the time is this... this thing that is like rot, it's rot and I feel like it's early, solid principles rot of like DRY. Like DRY is the principle that just needs to fuck off because it just ruins so much stuff. People will be like, "Well, I have a Postgres module, so I don't want to make another one." And it's like, "It's fine to make another one."

If you have one that 80% of your team is just like, "This thing's fucking great." that's 80% of requests that aren't bothering you for anything. You have all the time in the world to find this one teammate who's like, "Hey, that one doesn't do what I need. I need this one parameter that's at odds with what you've done over here." It's like, great, we have a special use case. That right there, that's the DevOps happening. That's the collaboration and the shared responsibility. Like figuring out how to work through something together and make an offering. And you might even find that that thing this developer's asking for, maybe it does eventually make sense in that 80% module. And guess what? You can open a PR and add it when you make that decision.

But I feel like so many people just stop and grind to a halt. They're like, "How do I make this RDS module (database engine aside) handle every case possible to make everyone happy?" And it's like, you're going to spend months working on that module. No one's gonna give a shit about 97% of the inputs to it.

Start with something dumb and simple. Codify your practices into it. And like, give yourself some time to deal with the dumb bullshit later.

Pavlo:

I totally agree with you. There's this middle land, it's really... I don't know why it's not popular... that's exactly what you're actually saying. You don't build in that you're wasting time, your time and everybody else's time. You need to understand what they need and how far they want to go. The non-functional requirements that you mentioned.

Honestly, in my whole life, whenever I saw that a project starts and they didn't understand NFRs beforehand, I fucking run away. Because I mean, dude, I mean there is nothing like architecture emerging on the way because you're gonna fuck it up. And you will have to redo the whole thing if you didn't think about the most important aspects.

This is how we need to do it everywhere. But it doesn't mean that you build for all possible use cases for all possible users on planet Earth, if you know that this thing is going to be Mickey Mouse size, like... I don't know, it's going to have like 10k requests a year. But you need to understand that number.

Cory:

Mickey Mouse, that's pretty funny.

Oh my gosh, dude, I had fun talking to you today. I'm glad we finally got to talk. I'm glad no trees fell over. Dude, I appreciate the three times that you showed up and we were unable to record... sorry, two reschedules and one botched one.

Where are you going to be next? Like what conferences are you going to be at? Where can people come and check out formae and Platform Engineering Labs?

Pavlo:

Oh yeah, so if people want to check it out then it's very easy, the URL is Platform.Engineering.

That's great. This is awesome. We love it. Yeah, we have a lot of material out there.

So I'm going to the US very soon, actually next week already and I'm going to stay on the West Coast for quite a while. There's a lot of people to meet and discussions to do.

Conferences... don't look at the conferences at the moment, because we kind of found this flow with the team... we're a small team and we kind of found this flow where we can use a lot of AI to help us develop real quick. What a great time to live in, man. I mean when you're running a company now, Oh my God, this is amazing, really. So we're really focused on our roadmap. There's a roadmap that we think that we want to give the community as innovation. But there's also official, you know, expanding support to that and this and this. So it's turning already into constant cases of caring about people who are using it.

We also recently launched the... well, our idea of how builders should build around Infrastructure as Code. So it's not only your code, but extensibility. So basically, writing a plugin with us is already absolutely easy without AI. But when you use AI assistants, it's quick and cheap and completely safe - that's important.

So we see that, like, a few people are already building a Proxmox plugin. I mean, okay, we didn't deliver the marketplace yet. So this means they don't see each other yet. Yeah, that's a pain in the ass right now a little bit. Working... we got to improve that. I mean, small steps. It's a lot of work.

So conferences. We'll look around. We're not going... I don't even know, did it already happen in Amsterdam...the KubeCon?

Cory:

I think it just ended. I think it happened. I don't know. It'll definitely be over by the time this episode airs.

Pavlo:

Yeah, definitely, yes. Yeah, we didn't go there. I'm looking forward to go to Salt Lake City end of the year again and meet with the people. And we're going to have great visuals again because, man, you can create visuals with your AI these days that are awesome. It is great. And, you know, if you have some interesting idea that's also attractive for everybody, it's fun.

Cory:

It is fun. So I think on your homepage, you actually have the graphics from... was it on the homepage? I could have sworn I saw them somewhere.

Pavlo:

Oh, yeah, yeah. We have them all over the place. We use this spaceman. Well, the core idea is that, you know, you are kind of in a... you don't really need to put that in the official episode, but our spaceman on the Mars surface that is completely devastated, that is our picture of the IaC state right now.

Cory:

I was going to say the aesthetic is a mix of JPL's visions of the future... if anybody's ever seen these visions of the future, I'll put a link in the show notes, it's beautiful art... but it's like an intersection of that and like "Fallout: New Vegas". That's what I saw. I was just like, "Oh, these are like my two favorite aesthetics in one." And then I came up and you and Zach were there... it was like the greatest day ever.

Pavlo:

Yes, man. Because, you know, people in the responsible positions now are all about this stuff. We grew up loving steampunk. We grew up loving astronauts. We grew up loving planets you will never visit in your whole life. All of it is great. It's amazing. And I mean, I'm happy that I can use AI to quickly generate something that has a message in it, you know.

Cory:

Awesome. Well I'll include shownote links to... you said that there's a Pkl Primer that you guys maintain?

Pavlo:

Yeah I can send you all of that stuff, like a few of them, and feel free to link whatever you want.

Cory:

Awesome, cool. I'll drop those in the show notes. Again, thanks so much for coming and apologies again for the reschedule.

Pavlo:

No worries.

Cory:

Thanks for tuning in today. We'll catch you next time and if you have any questions, if you're interested in being on the show, feel free to reach out at Cory@Massdriver.Cloud.

Show artwork for Platform Engineering Podcast

About the Podcast

Platform Engineering Podcast
The Platform Engineering Podcast is a show about the real work of building and running internal platforms — hosted by Cory O’Daniel, longtime infrastructure and software engineer, and CEO/cofounder of Massdriver.

Each episode features candid conversations with the engineers, leads, and builders shaping platform engineering today. Topics range from org structure and team ownership to infrastructure design, developer experience, and the tradeoffs behind every “it depends.”

Cory brings two decades of experience building platforms — and now spends his time thinking about how teams scale infrastructure without creating bottlenecks or burning out ops. This podcast isn’t about trends. It’s about how platform engineering actually works inside real companies.

Whether you're deep into Terraform/OpenTofu modules, building golden paths, or just trying to keep your platform from becoming a dumpster fire — you’ll probably find something useful here.