From DevOps to 'Vibe Coding': Gene Kim on AI-Assisted Development and Platform Engineering

What if you could turn a five-year software project into a one-month endeavor? Gene Kim, co-founder of IT Revolution and author of The Phoenix Project, reveals how AI-powered Vibe Coding is transforming the way developers work.

Kim shares insights from his upcoming book about how developers are achieving unprecedented productivity, including how his co-author produces 12,000 lines of production-ready code daily using AI assistance. But it's not just about speed - learn how this approach enables developers to tackle previously impossible projects and explore larger design spaces.

From DevOps evolution to practical AI implementation, Kim discusses:

What Vibe Coding really means and how it differs from traditional development
Real examples of AI accelerating development without sacrificing quality
Common pitfalls to avoid when implementing AI in your development workflow
How AI is making developers more ambitious rather than replacing them
The critical role of testing and feedback loops in successful AI implementation

Whether you're a seasoned developer or a tech leader wondering about AI's place in your development workflow, this conversation provides practical insights into the future of software development.

Love the show? Subscribe, rate, review, & share! http://platformengineeringpod.com/

Transcript

Intro: 00:00:00

You're listening to the Platform Engineering Podcast, your expert guide to the fascinating world of platform engineering.

Each episode brings you in depth interviews with industry experts and professionals who break down the intricacies of platform architecture, cloud operations and DevOps practices.

From tool reviews to valuable lessons from real world projects, to insights about the best approaches and strategies, you can count on this show to provide you with expert knowledge that will truly elevate your own journey in the world of platform engineering.

Cory: 00:00:41

Welcome back to the platform engineering podcast. I'm your host, Cory O'Daniel. And today I have with me Gene Kim of IT Revolution, author of The Phoenix Project, The Unicorn Project, upcoming book Vibe Coding, which we're going to get into here in a bit. Gene, welcome to the show. I have been waiting so long to get you on here. Thanks for joining me today.

Gene: 00:01:02

Cory, so great to be here. Congratulations on all your achievements. And my gosh, I was mentioning beforehand, I so much enjoyed the interview you did with Solomon Hikes on the early days of Docker, what he's working on. What a killer. That was so awesome.

Cory: 00:01:15

‍Yeah, that was a fun episode because like it was when I was first starting the show out and it was when I was learning that like… it's funny because people ask me this all the time because I have people on the show and like how did you get this person? I'm like you would not believe that behind the scenes, when you email people, they're just people and they'll just come on your show.

‍So it's just like… the amount of people that have been on my show where I literally just send them a cold email. Like Mitchell Hashimoto, I was just like “Hey, I'm like kind of behind this OpenTofu thing but do you want to come on my podcast?” And he's just like “Yeah, sure.” and I was just like, “Oh, that was way easier than I thought.” So if you're listening and you've got a show that you're putting together and you want to get somebody on it, just send them an email. And they say yes sometimes.

‍It was amazing to have Solomon on and I was very surprised that he was just like, “Yeah, I'll totally do this.”

I texted a bunch of my friends over the past couple of days once we had the date finalized, told them you were coming on the show, and I just got inundated with questions that people wanted me to ask you. So we're going to weave this in throughout the day, but the one I've got to start with to stay true to myself is - Gene, you're like one of the main voices in DevOps. What in the hell has happened over the past decade? I feel like there's guide rails to guide us all in the right direction. And so many alternative definitions of DevOps have arisen. What has happened and how do we deal with it?

Gene: 00:02:29

Yeah, I guess it sort of becomes mainstream, is what happens. And so your post, "DevOps is Bullshit", I had a lot of people text me about it.

I guess they're like, “Gene, what do you think?” And I thought, it's part of the evolution of the market. And I guess if you were to ask me kind of what did I think? I think it started from the very beginning about, you know, what is the Venn diagram between DevOps and Agile and Infrastructure as Code and Platform Engineering? And we can argue till the cows come home about what's the subset of the other? To what extent does it overlap? Is it disjoint? And I think so many people believe that DevOps is really about what happens when you have two silos that don't communicate enough and when they don't, you end up with catastrophic outcomes that are not good for Dev, not good for Ops, not good for the organization that we serve.

The state of DevOps research, which remains one of the things I'm most professionally proud of, that's the work that I did with Jez Humble and Dr. Nicole Forsgren that went into the Accelerate book that said, “You know what? It's not just a banner or a poster on the wall.” It's actually made up of cultural norms and technical practices and architectural practices that enable independence of action that enable fast feedback. And you know, no matter what sort of system you work in, you want independence of action. You want fast feedback on your work and you want an environment where it’s safe to tell bad news. In the context of platform engineering to make developers more productive or in a service, you know, in thousands of services… you know, I think they're all really the same. And so for me, I just find myself having a lot of equanimity about what you call it as long as we achieve a sense of shared goals and a general agreement on a direction that's better than others.

Anyway, so what happened? I'm sorry. That's a long way of my point of view. What happened?

f the things I got to do from: 2019

So what seems to happen in any movement is that in the beginning, it's all about the research problem and then by the end it's all about the tools.

And so I think some of that also happens, right? And I think it’s just a part of any industry succeeding, right? And so if I were to choose that route versus it being irrelevant and facing obscurity, I would definitely choose the first, right?

Cory: 00:05:53

‍Yeah, but the thing that I think that's pretty interesting that I see… I'd love to know like in your research and the companies you've worked with and people you've spoken to, like what you're seeing in this space as well… Like there's so many orgs today that I meet with that are struggling to adopt a lot of DevOps practices. I mean, there are still big organizations that are struggling to adopt even just like foundational things like CI.

‍Which just seems like madness but I mean, a lot of these organizations have software that's existed beyond a lot of the terms and tools and practices that we use today. We're still in a relatively young industry. But you see these teams… they've done DevOps, they have full DevOps teams, they're still dealing with the same struggle. What are you seeing as the biggest hurdle for them to get it? To be able to get past the struggle? They have the teams with the names.

‍Like they have the initiative, they have the drive, but they're right back in like ticket Ops and this animosity between the teams. Like what do you think is the biggest hurdle that trips people up despite the fact that they're like actively moving that direction?

Gene: 00:07:01

Yeah, great question. I think that's kind of the phenomena you see in any movement, right? You have a name, you have an ideal, right? But then it gets sort of cargo-cultured and it's easy to become cynical about that.

But I don't find myself cynical about it at all because we know what the goal is, right? The goal is, you don't want CI/CD for its own sake, you want it so that you can deliver feedback to developers in the daily work. And you want sort of the product build, test and operations to happen within seconds, worst case minutes… hours, if you really need to. And you can say that's a very tactical issue that only the “DevOps infrastructure engineering people” have to deal with. But no, that actually predicts how quickly you can give feedback to developers in their daily work, because you want to have them learn about mistakes within minutes not weeks, months or quarters.

So what's the measurement? It’s the code deployment lead time. So the goal is not CI/CD, the goal is to have fast feedback as measured by… one of the measures is fast code deployment lead times - so between build, integrate, test, deploy, and customers hopefully saying thank you. The other one I think is independence of action. I love the… famously right, Amazon… They went for an organization that were in the early days, in the late 90s, doing hundreds of deployments a year. And then as the product complexity grew, as they went from two products to 35 different product categories, it went to tens of deployments per year. Most deployments didn't finish. What's interesting is you have thousands of engineers and even small changes required weeks or months because you have to communicate, coordinate, synchronize, prioritize, blah, blah, blah… not with just your team but potentially every other engineer on the team.

Everyone knows the story, right? But what I learned working with Steve Spear is that the term for that is that no one had independence of action. That no one could do what they wanted to without having to communicate, coordinate with everybody else in the organization. And so of course, the story goes… the famous Jeff Bezos memo, famously chronicled by Steve Yegge, who I've been talking with every day for the last year on the Vibe Coding book… I mean, it shows that the effect was to recreate hard modular boundaries in the system to regain independence of action. So every team could build, test and deploy on their own. And that's how you go from, you know, tens of deployments a year to 136,000 deployments per day.

And I think it's just a magnificent example of what it is to really unleash and liberate teams to do what they want and need to do. And by the way, just real quick and just to show how nuanced the problem can be… you have Amazon video who kind of went the opposite direction and said, “You know what, we found that we kind of over partitioned the system and now we were spending all our time basically in transport.” Copying data from one bucket to another, right? And so, they glued it back together into a monolith and they reduced costs by 90%.

Figuring out how to partition systems, I think is a job of the leader. None of it is easy. DevOps is certainly maybe a part of the answer, but more importantly, are you doing the hard thinking to figure out what do you need to do to achieve the system or the goals?

Cory: 00:10:19

I love this. I think one of the things that I personally feel like I've seen missing in many teams is like the idea of DevOps is not a goal. It is the idea of like Kaizen and continuous improvement, right? Like you don't finish. You don't hit a finish line like, “Yeah, we did it.” Just like you were talking about with Amazon video collapsing it down into a monolith - like that was them getting to the point where they've realized something about what technical or social… like within the org… has changed.

And we have to continue to make this thing easy to work with and easy to deliver. I think that is one of the key pieces that… and maybe this is where the tools versus culture problem comes in… it’s like, “I have this tool. I’ve got to get to it.” And it's like, “Well, why? Why do you have get to this tool? What is the problem you're trying to solve?” How does this better things versus give you a thing to have, “I've got CI”? What are the benefits of it?

‍I think one of the key things is how do you market? As an engineer, a DevOps person, as an Ops platform, whatever… the person that types on the keyboards doing the magic compute things - how do you market that to the rest of the org so that they understand it?

I think that's something that we've, as an industry, at least as a practitioner in the industry, we've never been fantastic at. It's marketing and championing why we're doing things to the rest of the org. Which I think makes the rest of the org probably feel a lot more comfortable with what we're doing. They can start to understand it, measure it, understand the impact of it, not just hear like buzzwords flying out of us, right?

Gene: 00:11:43

Absolutely. It turns out that humans are just not really good at this part, right? And you will take some solace in the fact that like, someday there's going to be someone who writes the blog post, “Platform Engineering is Bullshit.” It's like, “No, no, you missed the point.”, right?

Cory: 00:11:59

‍I've got a draft. So everybody heard it here. It's not just us. It's all of us.

Gene: 00:12:06

It's all of us.

Cory: 00:12:07

It's all humans. Yeah.

t was originally published in: 2013

Gene: 00:12:13

Yeah, 2013.

Cory: 00:12:14

‍So I love doing this, going back and looking at the timeline of how we got to where we are. And it's funny because I remember where I was when I started reading The Phoenix Project.

I do this weird thing, I like to read big technical books while I'm on a treadmill at the gym, like I'm not reading like a romance novel. I've got a hard back and I'm just like, “Let's learn some stuff.” I remember my wife, or girlfriend at the time, was making fun of me for it because we're both at the gym and I had The Phoenix Project in front of me.

back and it's… I think it's: 2004

ay. And then around… was it: 2006

Gene: 00:13:22

Oh right, 2009.

Cory: 00:13:23

‍Was it 2009?

Gene: 00:13:25

Yeah. Allspaw/Hammond.

Cory: 00:13:27

‍Yeah, yeah, right. And so it's like we're very recent in this idea of DevOps versus like 50 years of software. We're just starting to get like, “How do we organize delivering our software?” It's essentially like looking at the timeline because 2013… which is not long ago. I mean, I was old then. I'm older now, but I was old then… but then it's like Kubernetes comes out in 2014. Docker comes out in 2013, I think after The Phoenix Project - certainly after the effort you put into writing it, right?

‍Then after Kubernetes, this might be… for people who aren't looking at the timeline, this might be mind blowing… after Kubernetes comes out, this little thing comes out called Lambda.

‍In my mental history of this, I'm like, Lambda has been around forever and Kubernetes showed up like seven years ago, but Kubernetes hit the scene first.

Gene: 00:14:16

In fact, I was at PuppetConf when Google gave the keynote talk announcing Kubernetes. I was like, “What just happened?” Yeah, totally.

Cory: 00:14:24

‍Was the entire room like, “What are you calling this thing?”

Gene: 00:14:26

No, actually it was really interesting because like that was when Docker Swarm was a thing. By the way, we were talking beforehand, I saw Nick Stinemates give the Docker demo at ChefConf in 2013, and in 2015 I was there when the Google team announced Kubernetes. Yeah, so these were heady days for sure.

Cory: 00:14:46

Yeah. And so The Phoenix Project comes out before this, right? And then boom, just within the next two years, just like major shifts happen in the way that we think about even packaging software in a Lambda versus just slobbing some code onto a VM someplace, right? As an author - somebody who's spent a ton of time researching this space, writing in this space before these technologies hit, and now that they're mainstream and we're seeing Lambdas, Kubernetes is the way most people are delivering software today in the cloud - how does that make you think about if you were writing a Phoenix project today, does that shift anything? Because it’s interesting, it's like they're tools, but they're so foundational to the way that we write software.

‍Do these tools present shifts in the way you think about the characters in The Phoenix Project versus at the time when this stuff wasn't readily available?

Gene: 00:15:43

Yeah, that's a great question. In fact, we released the first volume of The Phoenix Project graphic novel last year. And that was super interesting because it was my first time kind of really studying that book in some number of years. And what really struck me was that I wouldn't change anything.

In fact, I'm not sure if I could write that book today.Like one of things that I kind of was marveling at was just the calamity of what happened in Parts Unlimited that led to Bill quitting. It wasn't just the Phoenix Project rollout. At one point, the inventory management systems went down. Basically, every number that was being presented to the board, they didn't know what the account balances or values were. I was sort marveling at that because back then, I was spending a lot of time in the audit community. That's kind of the world they live in. In fact, I have the Institute of Internal Auditors International Professional Practices Framework book. But it's been a long time since I've read that. It's kind of marveling at how well constructed that scene was. And off the top of my head, I can't tell you exactly what systems need to go down to prevent you from knowing what your cash, accounts receivable, inventory… right? Basically every number you're supposed to report on. And I'm like, “Wow, that was pretty awesome.”

I wouldn't change much at all. In fact, some even more concrete evidence is in The DevOps Handbook. Dr. Nicole Forsgren helped deliver a second edition of it and almost nothing changed. It was all ads. A couple of… like literally a handful of sentences changed just because like not very many people are using CFEngine these days - so we kind of dropped that down. But the actual practices are somewhat timeless.

Cory: 00:17:28

Yeah.

Gene: 00:17:28

So I guess, as an author, you want to write books that have a shelf life of at least a decade. And actually, this is a real challenge working on the Vibe Coding book with Steve Yegge, where remote agents weren't a thing. We kind of speculated about it, but that literally started shipping two weeks ago. And so we're trying to make a book that had the shelf life of at least years, say five years, and I would say the space is changing so much more quickly.

e candidate drafts. So that's: 3000

So incredible. And we're like stack ranking. It's generating 30 pages of drafts, and we just highlight the parts that we like. We kind of cross out the parts we don't like to concretize a reward signal. And we're like, “Man, the winners are GPT 4.5, Gemini 2.5.” And way down at the bottom of the leaderboard but part of the leader pack is 3.7 Sonnet and we're like, “Man, why is it doing so poorly?” And Steve goes, “Yeah, because it's 43 days old.” I'm like, “Oh man, life moves so fast in the AI space.”

Anyway, just again, the challenge is how do you stick to sort of principles and practices that will remain timeless, even though the underlying technology is changing so rapidly. I've never seen anything like it. And I just have to say, I've never had as much fun as I'm having like right now. What a time to be in the game.

Cory: 00:19:03

‍It is. I mean, I would love to dig into Vibe Coding and AI a bit as well.

‍It's funny. So I'm of the belief that I don't think AI is going to take our jobs away. I think an engineer using AI is going to take your job away, is what it is. ‍It's one of those things. It's like, “Will it replace me?” It's like, “If you think it will, it will. If you don't think it will, it won't.” That's the key. It's a bit of a paradox.

‍I’ve started using AI daily in code generation, but I also use a lot of tried and true practices to lead that. And I feel like I get really good results where some of my friends are using it and they're like, “Eh, it generates garbage code.”

‍But it's like, I do TDD and I use my tests as my prompts. So I write a test and I'm like, this is what I'm expecting from the system. And it's a lot more concrete than fuzzily trying to describe something.

Gene: 00:19:48

All test plans are specifications, but not all specifications are tests, right? And so you just nailed like one of the really important practices, 100%.

Cory: 00:19:56

Yeah. And then using things like your ADRs. How does your code get to the way… like why did you make these decisions? It gives it a context that isn't there when you're just one-offing, like, “Hey, I need you to do something.”

Host-read ad: 00:20:09

Ops teams, you're probably used to doing all the heavy lifting when it comes to infrastructure as code wrangling, root modules, CI CD scripts and Terraform. Just to keep things moving along. What if your developers could just diagram what they want and you still got all the control and visibility you need?

That's exactly what MassDriver does. Ops teams upload your trusted infrastructure as code modules to our registry.

Your developers, they don't have to touch Terraform, build root modules or even copy a single line of CI CD scripts. They just diagram their cloud infrastructure. MassDriver pulls the modules and deploys exactly what's on their canvas.

The result, it's still managed as code, but with complete audit trails, rollbacks, preview environments and cost controls. You'll see exactly who's using what, where and what resources they're producing, all without the chaos. Stop doing twice the work.

Start making infrastructure as code simpler with MassDriver. Learn more at MassDriver Cloud.

Cory: 00:21:06

‍In this vibe coding era, though, where we're seeing… especially with the book, I'd love to learn a bit more here… is this more targeted towards engineers, people that are familiar with software, or people that dare looking for a no-code tool to be able to develop a business idea? Which I think is also sound and powerful - to be able to empower those people that don't have experience writing software to be able to create software. I think that's very, very important.

‍But I'd be curious, like, who are you targeting with the book? And also with Vibe Coding and people that are not familiar with engineering, like, can they expect the same results? ‍Or do you expect that they'll be able to get the same results as an engineer that is extremely familiar with software?

Gene: 00:21:49

Yeah, so we're targeting developers. You mentioned the famous Allspaw/Hammond Flickr presentation where they shocked the world by describing how they're doing 10 deploys a day every day, right? And you and I were probably part of the people who were fainting in the aisles, just thinking that this is like unthinkable, irresponsible, reckless, maybe even immoral, right? I mean, it was just crazy, right? Because most of the world is doing a deploy, maybe every quarter, probably more likely once a year, right?

We now know from the state of DevOps research is that the only way to be reliable is to do smaller deployments more frequently. And so I think the equivalent right now is just watching Steve Yegge work. So he's my co-author. I mean, he's the person who chronicled the famous Jeff Bezos memo - Thou shalt only communicate by APIs. He spent 20 years at Amazon and then Google. He's routinely generating 12,000 lines of code per day. That's his commit rate. That's production ready. This is the 35-year-old game that he's been working on called Wyvern. And, you know, to get to 12,000 lines of production ready code, you know, he's reviewing 100,000 lines. So just like in the writing game, right? The practices are so isomorphically identical, right? Seinfeld said, “Comedy is a game of tonnage.” and I think writing is too.

So the 70 million tokens is really about creating candidate drafts. So for the three to five paragraphs that make it in, we reviewed 30 pages. And so for Steve Yegge to be committing 12,000 lines of code, he's reviewing 100,000 lines of code. And so I think it turns the coding into a search problem. Just like comedy and writing is often a search problem. And so the goal is I think… we define vibe coding as anything besides writing code by hand.

And so Dr. Eric Meyer, he was part of the Visual Basic Movement, he owned the C# Initiative, he created the Hack programming language at Facebook and Meta…

So the compiled PHP… I'm sorry. No, that's HipHop… Hack was the strongly typed version of PHP that any PHP programmer could use. So he said, “The days of writing code by hand are coming to an end.”

I mean, he's been such an inspirational figure, right? And for someone who, I would say, would rank among the top 10 programming language designers on the planet… for him to say that, right? After spending his entire career in programming languages and theorem proofs and so forth, and all the work he did on Haskell, to say like, ”You know, coding by hand is probably not going to be a thing in five to seven years.” I mean, it just resonates so deeply with me.

Sorry for expounding on this, but I just found like, it's just so addictive, so fun. It makes us work faster. You can do things alone. You can be more ambitious about the things you can build. And you can explore so many more options and do things that you never could have done before, right?

For me, it was like struggling to move things to say GitHub actions, without having to learn, you know, it's obscure YAML syntax. It was just incredible. And that was like an hour's worth of work. That would have been three, four days, especially with the feedback cycle so slow.

Cory: 00:24:58

Especially when you get to that point where you're like… it's one thing to write software from scratch. Like, “Hey, computer, magic me something where users can register.” You've asked a lot. In a short few words, you've asked for a lot of software. But these boring, tedious, plumbing bits of our job, it's like… Sorry. There's one person who's about to get extremely offended when I say this… there's no joy in making a GitHub action. Nobody's like, “This is awesome!” And there's certainly no joy in configuring one - [sarcastically] like, “This is the best part of my job. I've been waiting for this all day long.”

Especially if you've got some like Bitbucket Pipelines and you're like, I’ve got to move those to GitHub actions. Nobody's like, “This is great. I've been dying. This is… man, order a pizza. I'm staying up all night doing this.” That is one of those perfect places of just like, “We just need the thing to work, this is making $0. We need to get back to like writing software.” That is one of the places where I think it's so useful.

‍That's where I've really seen it be extremely powerful - things where it's like you have a concrete system. There's a schema and a way that Bitbucket Pipelines work and there's a way the GitHub actions work. This is a real easy mapping versus “Make a user registration system”, which is a vague… seems concrete, but it is extremely vague in implementation.

Gene: 00:26:15

So those kind of one-shot miracles, right? Those are kind of great for like YouTube videos. But I mean, you know, I think for me, the life-changing part was like for the book. For 15 years, Trello has been my way of note-taking for books, you know, collecting notes.

Cory: 00:26:29

‍Trello.

Gene: 00:26:30

Trello, yeah, absolutely. I think it came out in 2013. So it's like 12 years.

So I've written so many programs that like give me a different front end and recently I wrote a tool to basically say, “All right, take the article and summarize it. If there's a YouTube video, download transcript, summarize it. And then find out which part of the book would just be a potentially good place to put it as a citation of evidence.”

So I couldn't get… I had to use Trello attachments API for the first time and I couldn't get it to work. And so 45 minutes in, I'm in Claude Code and I'm copying and pasting and like, “Does it work?” And it would tell me what to type, right? And I became the bottleneck and I'm just so tired of copying and pasting from one window to another. And finally out of frustration, I said, “You run cURL. I'm not going to run it for you, you run cURLs.” It’s like, “I can't.” It's like, “Okay, write a program that does what curl does.” 45 seconds later, you know, it tells me, “Oh, I got it working. I tried six different iterations and I got it working because I'm using the OAuth header pattern.” Whatever that is, right? And so like for me, that was life changing.

To actually have it see what it did, get feedback on its own work, and be able to iterate on itself. I mean, does that change like what programming is? Anytime you have like an API you just like iterate until it works and you don't really care in most cases, like what shape the API is.

I did it again for Google docs where I'm trying to add text into this thing, right? I can't get the shape of the JSON right. It's like, “Make it work.”

Incredible. Like who would want to go back to like iterating by hand? I remember the first time I saw a correct Kubernetes YAML file, like a deployment file, and it was so different than what my mental model of what correct YAML files look like.

Oh, man. Like, there's this entire category of problems that we just shouldn't be working on anymore, right?

Cory: 00:28:22

Yeah, for sure. There was one that I saw in the past few days. At my day job, we've been doing a vast amount of refactoring. We're actually going through a very similar Amazon video scenario.

We're using a ton of cloud services and we're collapsing it all back into our monolith. And it's kind of around like, we have customers that want to self-host our product and it's all built on top of AWS stuff.

‍But we have this one test that like rears its head every once in a while and it's flaky. I swear to God it only shows up when something has to get out right now.

It's like, “I'm unrelated to what's going on, but I just want to remind you that I'm here.” And it'll just fail 800 times in a row. And then if you ever sit down to try to fix it this test never flakes. And so I spent hours looking at it yesterday. I was like, “I'm gonna sit down, I'm gonna find this thing, I'm gonna fix this one flaky test that just like blows up our builds every once in a while.”

‍And it's a very important test. It never happens in prod, the scenario that crops up. I cannot find… for the life of me, I cannot figure out what this is. I'm staring at it for hours. I've got OTel traces on it, coming up with nothing. I've got Logger output and our tests just dumping stuff. Like, “What is going on here?” And then finally I was like, “Okay, like I've got tests for all this stuff. Like the test is flaky.” And I pretty much just like opened up Cursor, I put the three files - there's the API where it comes in, there's the business logic, and there's where I write it to the database - like, it has to happen in here, I can see the path this goes. AI was like… there's this one lazy part where we just decided not to use a struct, we were just using a random map… and lo and behold, in my test, there's this one case that it goes through where I misspelled one of the keys in the map.

‍And it was just like, dude, the amount of builds that have failed because of this. ‍It's like this random chance in the map where… it's like we randomized something in the test. I'm looking at it like, “What do mean I misspelled them? I spelled right.” And it's one of these words that I misspell all the time. And it was my misspelling. I’m like, “Argh!” But it was just like, I literally spent hours and AI was just like, “Yo, you don't know how to spell.”

‍This has been in the code base for like nine months, 12 months, just randomly it will fail a build. It's like, okay, that's cool. Thanks AI.

Gene: 00:30:37

I had a similar issue in my Trello front end that I built. I haven't touched it in two and a half years and there's a specific case where if I move the thing, it brings up a dialog box, and sometimes the move gets sent to the wrong list. For two and a half years, I've been living with this. And I had the same thing - I'm going to sit down, and I just documented all the cases, and then had to generate all the logs. And then, as soon as I found the error, I just dumped all the logs into Claude Code and it found an eerily similar error. I used Clojure, which is famously a dynamically typed language. And so those errors… I had exactly the same error, a misspelled field that turned into a nil.

Exactly the same problem.

Cory: 00:31:22

‍Yeah, sorry NoSQL fans. Type that stuff.

Gene: 00:31:26

And so what I did after that was like, I made specific getters for that, right? Because this is a known vulnerability of dynamically typed languages. Just make it so that this category error can't happen again, because I know like there are other landmines out there that just are going to blow up on me sometime. Yeah, 100%.

Cory: 00:31:43

Yeah, this was one of those times where I was like, I know why I did this a year and a half ago - I was just being lazy. I was like, “I'm going to make a struct and I'm going to pull this into the struct so it never happens again.”

Gene: 00:31:52

What language is it in?

Cory: 00:31:53

Elixir, which is a derivative of Erlang.

Gene: 00:31:55

Yep, so also not strongly typed. Awesome language.

Cory: 00:32:01

Yeah, I think once a week I make a social post somewhere about… there's this tool called Dialyzer, which is like an after the fact type checker.

It's always right, but it's never generous about being right. It's extremely rude. And it's also very just like… it just describes things in like alien language. It's like, “What does that mean to like an actual human?” It's like, “You'll figure it out. And it's like, “Eh, okay.”

‍So back to the Vibe Coding book. I feel like this is a hard book to write, given there's just so much changing so fast. But it's funny, I've started to look into vibe coding. I'm trying to figure out if I'm doing it already. Is TDD-based testing vibe coding? I don't know. It's a weird vibe, if so.

‍Are you all experimenting a lot with the speech-to-code? Is that vibe coding, or is vibe coding just the use of these tools and agents?

Gene: 00:32:49

We call Vibe Coding anything besides typing in code manually by hand. And so obviously it's a very expansive definition, sort of like DevOps. We chose a very broad and generous umbrella. But yeah, a hundred percent.

The way we're talking about it is… I think there's kind of two parts. One kind of talks about it sort of like how these things work, in sufficient detail to sort of understand what sort of the failure patterns are. Like one of them is just explaining that because AIs are just bad at instruction following in general, and then two is kind of it gets amplified when you have context saturation. And so whenever you give it a bunch of tasks, the AI will, in its attempt to be helpful when you say fix a bunch of tests, as it starts running out of output context window, it will just start sort of half-assing things and say, “Yes, I fixed all your tests.” and maybe have deleted 30% of them. Or in Steve Yegge's case, 80% of them. He learned actually weeks later via a colleague.

Cory: 00:33:47

‍The easiest way to fix a flaky test is delete it.

Gene: 00:33:50

No, totally, right? Which is why it requires just so much vigilance. And the problem is that I think it was like 35 commits after the fact a colleague would go, “What happened to all the tests?”

So just being kind of aware of like… even just concretizing that has made me a better prompter. Even just kind of pointing out that when you compare the conversational prompting you use kind of in vibe coding versus prompt engineering… just even making the analogy that the prompts when you're vibe coding are more like texting a friend who's trying to help you versus like prompt engineering is like emailing your lawyer who's actually suing you.

Everything is fraught with consequences, needs to be tested, I mean, it's rigorous engineering on its own.

And then we kind of break down like here are the practices you need in your inner loop, middle loop and outer loop to prevent detecting correct problems. I mean, it's just been… I can't tell you how much I learned in this. And if you don't do this, you end up with these… we came up with like four or five examples of when vibe coding goes wrong. You know, creating content code bases that, basically you’ve lost all modularity and it's incomprehensible, impossible to understand, let alone… impossible to change, right?

Which actually happened to me in his workbench tool I used. What can go wrong when… you know, Steve found himself with 80% of his tests deleted silently, discovering weeks later. Just being aware of like how AI will backdoor every… you know, for anything you're trying to build, there's an infinite solution space of how to do it and AI will find the weirdest ways to implement your solution. Just to get concrete, for this writer's workbench I wrote… I just decided to write a spinner status window in a swing screen - just the easiest way I could, you know, figure out how to do it - and left to its own devices, it will create new windows, it will write directly to the screen, right? And I'm just trying to tell it, “Use only the five functions I gave you. Don't open a window. Don't write directly to the thing. I need you to follow these functions so that I can take a screenshot of it.”

Anyway, so just being hyper aware of how you have to constrain the solution space to get the outcomes that you want. I mean it's just… what an amazing time to be alive. Right, Cory?

Cory: 00:36:01

‍It is. I mean, Steve producing 12,000 lines of code a session is pretty impressive. But I think like that's… You know, it's funny. Time to be alive - yeah, for sure. When you think about AI like to its logical extreme, I think there's a lot of really interesting and scary problems get presented, but like day in, day out, it's one of these tools that I kind of look at very similarly to how we saw CI when the term started getting popularized. And the first time I heard TDD, very similar, right?

‍I was like, “I don't want to do that. I like writing software. I don't want to write tests.” And there's still so many engineers I meet today that are like, “I don't like writing tests.” It's like, it's still just software.

Gene: 00:36:43

Yeah.

Cory: 00:36:44

‍It's writing software too. And I think, as soon as you realize that doing it gives you more time and makes you more effective, it becomes a very interesting tool. Just like CI, it's like, “Oh, it's such a pain in the ass to set up these Jenkins builds.” But as soon as you do, you're not doing this manual labor anymore. And like to me, that's kind of what I've seen coding assistants as. It’s like, there's a lot of this stuff that I do that I don't want to do or parts that I don't necessarily care about. And if I can get a machine to do it accurately, it saves me a lot of time.

‍At the end of the day, I've only got so many of these hours to put it in in a week before like I go do something else. It's like every, every ounce of productivity I can squeeze out of myself by using one of these things, like I'm looking for it.

Gene: 00:37:24

I love it. In fact, I mean, that story so much resonates with me. There’s a story in the book about how Steve's been working on this game for 35 years and he had such huge dreams and aspirations, you know, way back when, but he said it got too big. He said to him it felt like… he worked on Android Studio and he felt like it became like Android where there's thousands of bugs, you know, thousands of things that wanted to be done, but he said it felt like pushing a dead elephant down the road. It was too hard.

He sort of like mentally gave up. And then Claude Code comes out and he said he couldn't sleep that night because all he could think about was suddenly things were now within reach. To get to 12,000 lines of production code a day, he has five Claude Code and AMP sessions open - Sourcegraph Amp. So he's burning $300 to $500 of tokens per day, but for him, it's worth it because it brings kind of these goals, dreams, and aspirations within reach.

We're both in our 50s. And he said, “There's only so many more five-year projects we can work on. So you have to be very, very picky.” But now, those five-year projects are potentially one year, five months, maybe a month.

Cory: 00:38:32

Yeah

Gene: 00:38:32

And so it just changes the math of like what you choose to work on. You can be so much more ambitious and there are all these things that you just wouldn't have done, but now you can do them within hours. I mean it's just life changing, right?

Cory: 00:38:45

‍I mean, not just for startups, think organizations. So many orgs have like their skunk works projects and all this other stuff. I feel like now the ability for, not just an engineer, but even maybe a product manager to MVP something really quick…

Gene: 00:39:01

Yeah.

Cory: 00:39:02

Like, can we get something out that even communicates what we're trying to build? Like what I'm thinking in my project manager head versus like what I can communicate to an engineer. And can I get any traction on it before we spend a bunch of time making a production version?

How many businesses have just been knifed by that? We have this idea… it’s like, “Okay, you’re just going to invest a few million dollars into it to see if it works.”

‍But now it's like, sit down and pop it out in a day and be like, “This is cool. Customers are interested in it. Let's spend some time with some engineers to make it resilient, better, scalable, et cetera.”

Gene: 00:39:38

Oh my gosh, yeah, let me just riff with you there. So we called it good FAAFO - faster, ambitious, alone, fun, and create optionality to explore a bigger design space. And I think alone is one of most interesting ones, right? A story actually came from Steve Yegge a year ago, when I first met him, he said his Head of AI wrote a classified model in basically a day. Had that happened in 2024, that would have been a summer intern project for six weeks.

Cory: 00:40:07

Yeah.

Gene: 00:40:08

So really great for that Head of AI, right? Not so great for that summer intern who would have had an amazing experience.

Essentially when you can do things alone, there's like two taxes you have to pay. One is the coordination tax, right? It's like you’ve got to schedule together, prioritize together, right? Then it gets worse as you add more teams that need to be involved. The second one is that it's still difficult to read other people's minds. So it's this upfront investment to describe what you want. And so for that product manager to go into bolt.new or to Claude and say, “Hey, give me three columns with two windows, three buttons here, and just have them create that on the fly and give a working prototype that has all the actions designed in, that is a high fidelity specification that takes you a lot further than say a Figma diagram or a PowerPoint slide.

That's amazing and there's all these examples that I love. In fact, one person described it as the drift. This is Dr. Daniel Rock at the Wharton School. He was part of the authorship team behind the OpenAI jobs report - this amazing study. But he said, “We built a GitHub app. Tuesday, I learned what one is. By Thursday, we will have one running.” He called it the drift like in the movie Pacific Rim where you have two pilots controlling a giant mech because all you need is to mind meld to control this thing. Basically, he went into Claude and said, “Help me write a specification to write a GitHub app.” Roped in his Chief of Staff, who's a designer by trade. This markdown specification kept on getting larger and larger. Eventually they roped in their senior engineer and it got to the point where that person could just write the test and start wiring it in.

It reduced the cost of coordination and reduced the cost of like trying to read someone else's mind. I mean, it's like… it changes the shape of organizations for sure.

Cory: 00:41:51

Yeah, the read someone else's mind, I think, has another interesting implication. And that is, I think, no matter what your organization size, there's… you know, calling it legacy code is one thing, but there's code that you're not familiar with. It's funny, like when we think about software, from Node.js to Erlang to Ruby to C, like you can say that there's these different programming languages for their feature sake. But I feel like a lot of programming languages exist because we want to express our ideas in more interesting ways. But at the end of the day, to a computer, our programming languages don't matter at all. To it, it's zeros, it's zeros, it's ones.

‍So looking at somebody else's old code… when I write code and when I write my tests, I generally think… How is somebody else who has zero familiarity with this part (whether they're a new employee or this is part of the stack that's been stabilized, become legacy and you don't touch very often)... how can I write my code in a way that's not just documentation, but like the code helps explain what I'm doing?

‍And what I’ve found the AI is great at is just giving you a TLDR - like ELI5 on a section of code. Not even like the whole code base. Like I’ll pop open a module, you know, maybe it's like a domain function and I'm like, “What is the business purpose of this and what quirks are there to how it works?” Having it spelled out - this line is doing this, and the reason it does this is…, I can see that there's some debt over here - It's just like, oh my gosh, this is so much better than staring at a block of code, drawing a picture, trying to reason about how it's all connected together. If I had a HUD that would just do that on code I haven't seen in a while, it's like, “Oh, this would be fantastic.”

Gene: 00:43:34

100%. In fact, like you, I've kind of chosen kind of a more obscure programming language to like do all my work in. So you chose Erlang, no… Elm?

Cory: 00:43:44

‍Elixir. Elixir built on top of Erlang.

Gene: 00:43:47

I chose Clojure. That's also on top of the JVM. At first I felt like, am I going to be at a disadvantage because there's just a lot less training data on Clojure code? I have not found that to be a problem, but you know, it's not unreasonable to think that, you know, maybe you choose languages based on just how much training data there is on it. That's maybe one thing.

The other thing I just found really exciting… and this reminds me of the “[blank] is bullshit” thing… is just how in the beginning people react to kind of these amazing new technologies. Whether it's cloud, DevOps, CI/CD…

One of people who was reading the book, he described how (I have his permission to share the story, just don't necessarily want to make it worse) he had this open source project that he was using that he wanted it to do a certain thing. And of course the open source project owner said, “Don't have any plans to do it. PR is welcome.” He doesn't know Python. So, you know, for the first time, he decides, “All right, let's see if we can create that functionality.” And it's like a 20 line diff. He submits it, has it merged. He can now solve his problem. And then he writes somewhere that he used AI to do it. And suddenly someone outs him. Says, “You should have told the maintainer or something and blah, blah, blah, blah...” In the pull request, he actually force pushed the history… he took his change, but like erased kind of the fact that it was him who contributed to it.

What's so interesting is that - What was the lightning rod? What was the vitriol? It's like, “Oh, he didn't even know Python. And yet he had the audacity to offer this PR that worked.” And it's like, “Oh, should you or should you not tell the maintainer?” It was like, “Oh my gosh, I just never…”, I was just… It was mind-expanding because never could I have conceived a world where you would get yelled at for using AI to help solve a problem.

Cory:: 00:45:51

Yeah, I mean, like I maintain quite a few projects. I feel like sometimes you get people ask for stuff that you're like, “I don't see the value in that” or “Maybe like that doesn't fit here, but I haven't given you the right interface so that you can make something of your own.” There's reasons like to not bring that into your repo. I mean, if you're just letting stuff in that you don't think is valuable, that's crazy.

‍But to be like, “Oh, this is valuable. Let's definitely take it and let's keep it. But like how dare you write it the way you wrote.” It's like, wait until you find out I'm not doing this in Vim.

‍That sounds like that's very benevolent dictator-ish behavior right there.

Gene: 00:46:30

I think we're gonna put that in the book.

Corry: 00:46:32

That is funny.

Gene: 00:46:33

I think what I loved about it was that here was someone… he was able to do something that would have been otherwise out of reach. He solved a problem that was really important to him. This little downside of like the way some people react, like, “Oh, you wrote a book using a word processor instead of a manual typewriter? Oh, how dare you!”

Cory: 00:46:53

How dare you? Technology, get it out of here. I don't like the technology at all.

Gene: 00:46:56

By the way, just a riff with you, right? You can also reject the diff if it wasn't up to your standards. But he said, “Great work.”

Cory: 00:47:06

‍Yeah, like who cares? If it adds value, it meets your coding standards, didn't break the build, like get it in here.

Gene: 00:47:14

That’s right.

Cory: 00:47:15

So the second thing you were talking about - Clojure, Elixir, like the lack of big training sets. Like I see that crop up in Elixir from time to time. I know some other Elixir folks listen to the podcast, I'm not sure if they've had the same issue. What I actually see is like there was some stuff that changed pretty significantly between a couple different versions. It's very much like, “Hey, it's just treating it like it's an older version.” It's like the code works but it's not like idiomatic Elixir. So you'll see stuff like that and it's like, “Eh, it works.” And my linter or formatter will usually pick it up, I see them fight over it a little bit.

‍But the thing that I see more with lack of training data, and this is something that I'm thinking about quite often, is we have tons of training data on software.

Now, we don't know why. We haven't labeled this stuff. Something crawled GitHub and it's like, “Oh, this is the way it is.” And it's like, “Well, why is it that way?” We don't have ADRs for every single like.. ‍It’s like why? So you get some crappy code sometimes. Like lowest common denominator, maybe the average, is what we're getting into these models. Which is great, if it works. We can always make the code better and read better for humans. What matters to me on the first pass is that the test passes.

I can make the code look better through any other tool. And as AI gets better, we do more linting, it gets trained on the linted code, the code gets better. But the thing that I get asked a lot as a business owner in this space is like, “Oh, how is AI assisting DevOps? Is it going to take the jobs away from operations engineers and sysadmins?”

‍The crazy thing is we're not training AI on prod. We don't train it on our architecture diagrams, the metrics, the usage of these systems. I mean, there's people that are starting to do this within their own orgs, but there's not an LLM that takes into account production. And so like, I've actually seen where this is harder - it's harder to train, it's harder to like use it day in, day out. And I was trying to respond to a Reddit post this morning, because I saw somebody's like, “How are you all using this in DevOps?” And I was like, “Well, I do Dev and Ops. I use it more on my Dev side, actually very rarely use it on the ops side. Maybe I'll use it for generating some OpenTofu code.” But like, it generated the code, it's still not thinking about my use case that I have. That's the hard part.

It's very easy for it to be like, “Some Terraform? This is what it looks like.” It's like, “Yes, but um… does that deal with the scale? Does that deal with cross-region stuff?” There's so much reality to production that's not in these models.

‍You know, as somebody who's working in this space with like the Vibe Coding book, somebody who's researched and done a ton of work on the Ops side, like I would love to know, are you seeing people that are starting to address this? Is this still one of those kind of unsolved problems in this space? And do you think there's a reality to having an LLM that's trained on production systems?

Gene: 00:50:16

Everything you said is true. You know, the first time someone talked about like, “Oh, hey, convert this Puppet/Chef script into terraform.” I'm like, “Oh, Holy cow.” Because it's worked.

It’s a classic problem that is very tractable. And I think a lot of legacy code conversion follows a similar pattern. I totally agree with what you're saying in terms of like, you need a reward signal that says this pattern is better for these production scenarios.

I saw a talk from a Paige Bailey when she presented at the conference last year, where Google is actually doing that. They actually kind of closed the loop that said they went from code generation to telemetry to power usage in the data center. And they actually found, if you do it this way, it's actually better because you use 12% less power when you're running across hundreds of thousands of clusters or whatever. So that's super interesting. But that's Google, right?

I have no doubt that someday that will be available to all of us, not just kind of these very magical places like Google.

Cory: 00:51:13

That one's really interesting. So like getting that production data in, I've started playing with it a little bit. And so again, like back to Dev and Ops, like I use it more on my Dev side, but from an operational point of view, one of the things I've given Cursor the ability to do is… we lean heavily into OTel... So I actually have Jaeger running locally for all my tests. So as tests run, I have spans. And so I don't do break points anymore. My philosophy personally is - if I don't understand it during tests, I'm not going to understand it in prod.

‍And so that's kind of what leads my use of telemetry and adding events in telemetry and attributes and whatnot. So one of my Cursor rules is you can add whatever open telemetry you need. And I have an MCP server that it can actually fetch traces from.

‍And so like I actually watch it. Like it'll add some traces. It'll try to implement my test. It'll fail, add some traces to figure stuff out. And then I can see it like pulling these traces back.

It's starting to get an idea of like… it's not production yet, but it's like productionist flows through the data. And I feel like it writes better code and it does very good OTel when it starts to take these kinds of rules into account.

Gene: 00:52:21

And just to concretize something you said. That closing the loop, where it can see what it's doing and actually can iterate by itself… once you experience that it's really difficult to go back to doing it by hand.

Cory: 00:52:32

‍Yeah, it is. One of my diehard Cursor rules though is you are not allowed to ever change tests. So Steve's got to add that one.

‍But that's the thing, that is the software I actually care about, right? Like that's what I've found in my career. I care way more about writing my tests than the code that is running. Like that makes money, but this gives me assurances that that's gonna make money.

Gene: 00:52:56

100%. Can I add one caution to like what you said? One of the sections we have in the book is that AIs are a litter bug. I mean, they will make so many like little test scripts for themselves, they will add so many debug statements that it can actually flood… there's this point where like it is actually degrading its own performance because it's flooding its output with debug statements. And so that increases context saturation. So one of the practices we recommend is actually going through and having it remove all of the extraneous logging because the more stuff that you fill up the context window with, it actually impedes the real problem-solving that has to be done.

Cory: 00:53:34

‍So going back to the book, is it already available, or is coming out in fall, or is it pre-order right now?

Gene: 00:53:41

Yes, late summer or fall, we're kind of debating that right now. We have a draft manuscript out to early reviewers. Gosh, I would be happy to send you one if you would be willing to look at it, on the condition that you give me candid feedback.

Cory: 00:53:58

‍No, I would love it because I want to figure out how to do more of this more efficiently. I feel like this is probably one of your challenges in writing the books and one of the challenges in learning it is, there's just so many terms. I remember when people were making a huge deal about RAG, like six months ago, haven't heard it mentioned since that week when it was big. I'm like, do I need to go spend some time learning about that or not? I don't know.

‍I would absolutely love to review the book and I will use it as a guide in like my own education over the next couple of weeks. I set aside every Friday afternoon for like trying to stay on top of all the trends in AI. So yes, send it my way. I'd love to.

Gene: 00:54:34

Right on. In fact, yeah, it's available for pre-order on Amazon. I'm just so thrilled with the feedback we're getting - even from really experienced people saying there were things that were explained in the book that actually helped them. It's been the most fun thing I've ever gotten to work on.

Cory: 00:54:51

‍I'm very excited. I genuinely appreciate you coming on the show. I know it took a while to get you on here. I'm so glad I had the time. I feel like I could bug you for hours.

Gene: 00:54:59

No, to be continued, absolutely.

Cory: 00:55:00

This has been a very fun talk. What else have you got going on? Where can people find you on the internet?

Gene: 00:55:06

I'm @RealGeneKim on most of the platforms and, yeah, and I'm on LinkedIn. So yeah, that's probably the best way to reach me. I'm trying to get into the mode of writing more about what I learned on the vibe coding adventure. And all I can say is, I feel like I've had the privilege of working on some really fun projects - being early in the DevOps movement, helping chronicle like the journey to large complex enterprises - but I've never had as much fun as right now.

t and Mercury's precession in: 1911

So just to try to show that there are conditions that need to be in place that leaders must create to really get the 2x, 10x improvement that we feel like we've experienced.

So if you don't have independence of action through architecture, if you don't have fast feedback loops, if you don't have tests, you're going to have a bad time.

Episode 30

28th May 2025

From DevOps to 'Vibe Coding': Gene Kim on AI-Assisted Development and Platform Engineering

Transcript

Listen for free

About the Podcast