Navigating AI And Platform Engineering With Amey Patil

In this episode of the Platform Engineering Podcast, Cory O’Daniel speaks with Amey Patil, Head of Platform Engineering at Google Ads and Google Analytics. They discuss the evolving landscape of platform engineering, the integration of AI and ML, and the human factors essential for leading successful platform teams. From managing legacy code migrations to fostering a culture of innovation, Amey shares insights and strategies that are vital for both seasoned professionals and newcomers to the field. Tune in to gain valuable knowledge and stay ahead in the dynamic world of platform engineering.

Love the show? Subscribe, rate, review, & share! https://platformengineeringpod.com/

Guest: Amey Patil, Head of Platform Engineering at Google Ads and Google Analytics

Amey is a seasoned technology leader with a proven track record at Google and VMware. He excels in driving innovation and spearheading high-performing engineering teams. Currently serving as the Head of Platform Engineering for Google Ads and Google Analytics, he oversees infrastructure supporting multiple high-revenue products, leading a global team.

Transcript

Intro: 00:00

You're listening to the Platform Engineering Podcast, your expert guide to the fascinating world of platform engineering. Each episode brings you in-depth interviews with industry experts and professionals who break down the intricacies of platform architecture, cloud operations, and DevOps practices. From tool reviews to valuable lessons from real-world projects to insights about the best approaches and strategies, you can count on this show to provide you with expert knowledge that will truly elevate your own journey in the world of platform engineering.

Disclaimer: 00:45

Please remember that the opinions expressed in this interview are those of the guest and do not reflect the stance of the company they currently work for.

Cory: 00:53

Thanks for listening to this episode of the Platform Engineering Podcast. I'm your host, Cory O'Daniel. Today, I have Amey Patil, the head of platform engineering at Google Ads and Google Analytics. Amey, thanks for joining us today.

Amey: 01:03

Thank you for having me on the show.

Cory: 01:05

Could you tell us a little more about your background and your journey into platform engineering?

Amey: 01:11

Absolutely. In my current role, I ensure the productivity of various ad products. I'm the platform engineering lead for Google Ads and Google Analytics. Google Ads encompasses a broad range of products, including Search, Ad 360, Display, Video, 360 Apps, Creative Studio, Campaign Manager, and Google Analytics. They all fall under the ads portfolio of products. Within Google, platform engineering is often referred to as core productivity, developer productivity, or engineering productivity. These terms are used interchangeably. The users of my products include in-house developers, program managers, product managers, UX designers, and data scientists, essentially anyone involved in pushing the core product to external customers. So that's my current role.

Cory: 02:04

Awesome, very cool. I've always wondered, especially at a company as big as Google that has a cloud product, do your teams tend to use your own cloud products, or do you use Google's internal systems?

Amey: 02:19

We use Google's internal systems. Actually, we do not use Google Cloud products. We have our own version of GCP, which is a massive-scale infrastructure running parallel to the external GCP. Many solutions available on GCP are developed and tested internally by Googlers, which contributes to the excellence of GCP products. So while we use the same underlying technology, we do not use the same APIs or infrastructure directly.

Cory: 02:54

That's very cool. What I've seen a lot in the space, as far as the people getting into platform engineering, is a lot of DevOps professionals almost being rebranded as platform engineers or trying to take that next step in the DevOps maturity model towards platform engineering. Internally, with your teams, who do you see joining the platform engineering teams that you run? Are they coming from more of a DevOps background or more of an engineering background?

Amey: 03:24

It's a mix. Within Google, we've had folks from pure development, classic test engineering, quality assurance, DevOps, and system administration. The kind of work expected for this role within Google requires understanding these various domains to be really effective. So within my group, I have folks across the spectrum.

Cory: 03:51

Very cool. And it sounds like many people have been in that engineering customer service realm where they're already working with engineers as their customers.

Amey: 04:00

Yes, we've even had folks who were data scientists because many of our products are very ML-forward, or now particularly AI-forward. People who understand MLOps are also on the team. To understand MLOps, they often have a background in ML development, either as a data scientist or as an engineer who supported data scientists. So it's quite a broad spread.

Cory: 04:26

Yeah, and I feel like a lot of what we do as platform engineers has evolved from DevOps towards platform engineering. Many bespoke systems built internally at numerous companies are becoming core parts of platforms and are available in more open-source tools. We're seeing many platform teams incorporating other engineering disciplines, bringing in security and compliance as part of the platform. It's exciting to hear your team is bringing in data scientists and ML as part of the internal platforms you're building. Can you tell me about a project where data science and ML have impacted the platform?

Amey: 05:14

Absolutely. You touched on an important point. Besides skills and expertise, domains like security and privacy are crucial at Google. In one of my recent projects, we supported serving YouTube ads. Part of this involves understanding how models are created and the best way to serve them. We have strict latency expectations, so it involves determining when to serve a larger or smaller model based on whether it needs to be a filtered list of the recommendation engine or an ordered list. This requires understanding the product from a data science perspective, including the model's intent when recommending the next video and personalizing ads. For instance, we might need to adjust the model for a particular country based on its specific needs. So, we need to understand the product and the model, and then do the platform work to make that happen.

Cory: 06:42

What kind of challenges have you seen? Many different teams face challenges like bringing older school operations engineers into platform engineering. They may have a ton of Linux administration experience but haven't done traditional software development outside of scripting. That's one burden of coming from that side of the world. On the other hand, I've seen engineers working on the product side of an e-commerce platform move to the platform team, where they need to quickly pick up a whole world of cloud knowledge to be effective. What are some of the biggest problems that ML and data scientists run into when joining and contributing to a platform team?

Amey: 07:26

I feel that adopting the mindset of a platform engineer for an ML-based team has a few nuances. In essence, the concepts are pretty much the same. You want to bring up systems, enable developers to do their development faster, and enable data scientists to test their models quickly. It's crucial to identify if a model isn't behaving as expected and ensure they have the datasets they need and are able to filter, root cause, and debug issues. For instance, they might find that the data used for training doesn't match what they actually got, so they need to look at skew testing.

We need to enable data scientists to develop, test, and troubleshoot their models and ensure a smooth transition to production. Concepts like canary releases and feedback loops apply here as well, like increasing traffic from 1% to 5% based on model performance. If someone understands these basic concepts, it's easier to apply them when evaluating problems. However, technical nuances arise, such as using a model repository instead of an SCM to host the code. Despite these differences, the overall concepts remain the same. Thankfully, with the advent of AI and code generation, people need to understand the overall concepts, as code generation is somewhat taken care of, reducing the worry about this shift.

Cory: 09:24

You've been in the space for quite a while, right? You've been at Google for a while, and before that, you were at VMware. You've seen many changes in platforms over the past decade. With the recent surge in AI and the attention it’s getting, do you think this is a trend we'll see for many platform teams, or is it more specific to Google Ads, incorporating data science and ML into the platform team?

Amey: 09:53

I've been getting this question a lot lately. Google has been an AI-first company for a long time, with AI often referring to ML. Generative AI using large language models is an evolution we're seeing now. I'm fascinated by this shift in technology; it's opened up unlimited possibilities. I see it impacting various stages of the product development lifecycle, from ideation to launch, and every touchpoint. Every team at Google is considering how GenAI can enhance their product, surface, or user experience, making them more productive and adding value.

In the context of Google, platform engineering encompasses much more than classic infrastructure as code. It includes documentation, generating PRDs or design documents, creating UX mocks, code generation, infrastructure as code, test frameworks, monitoring, and debugging. In every area, GenAI is making a meaningful difference, not just in ease of task execution but also in productivity and quality for engineers.

Cory: 11:24

Yeah, I've worked with quite a few companies, building out internal platforms, and I've talked to many platform engineers over the years. While you might assume that all these companies building platform teams are deploying software to the cloud and containers, you would think that many of the platforms are similar, especially with technologies like Kubernetes.

However, when you talk to different platform teams at various companies, particularly across different industries, you see very different approaches to platform engineering. Sometimes it's just adding stuff on top of Kubernetes, and sometimes it's something home-baked. So, you see a lot of these systems that aren't as similar as one might assume from outside the platform engineering world.

When it comes to AI, I feel like almost every decision point and feature in a platform could find a place for AI. For a team that might be AI-strapped, without the budget of Google or a dedicated AI team, how do you find the right place to start using AI, given its flexibility and potential to be used across the stack?

Amey: 12:52

Yeah, I think that's a very valid point. Right now, GenAI specifically, or even AI in general, isn't cheap. It requires significant resources, both in terms of the cost of running GPUs and TPUs and the skill set of engineers who can capitalize on these capabilities. Google has provided many of its teams with this luxury, and I'm thankful for it. However, I see a lot of this being commoditized soon.

omputing evolved in the early: 2000

I see a similar trend playing out for AI. You may have a few major players in the overall space, but many others will build on top of these, selling at a lower price by targeting smaller verticals, eventually making AI accessible to teams that can't consume it today.

Cory: 14:30

Before we hit record, you mentioned that your team had recently used generative AI to migrate a legacy code base. Could you dive a bit deeper into some of the lessons learned with using AI to refactor legacy code?

Amey: 14:51

Absolutely. Let me paint a picture of what we needed to do. Ads is a 20- to 22-year-old business now, and you can imagine there are tons of codes, possibly tens of millions of lines or more, much of which is legacy. We have engineers who have moved on, and it's a massive, complex system. When I say complex, I probably can't emphasize enough just how complex it really is. It requires constant maintenance to stay up-to-date, especially to match current trends. We have tons of competition in this space, and to deliver at the pace needed for success, we must keep upgrading our technology. One project involved migrating our query and storage layers, which again consisted of millions of lines of code. Around that time, GenAI had gained momentum, and we decided to give it a go instead of tackling smaller problems like bug fixes.

Cory: 19:46

I mean, I assume it is moving faster than just a bunch of people refactoring it, right?

Amey: 19:51

Yeah, I mean, I think it's definitely showing a lot of value in terms of savings. Once everything really comes together, the entire orchestra will allow us to make a lot of changes. It's very similar to back in the day, when we had manual QA with 100 test engineers testing the website. Then came Selenium, enabling front-end testing, and suddenly the same work could be done by ten engineers. Similarly, with code generation, tasks that might take 15 units over two years can now be done by 304. It will take an initial six months to get the system to the point where they can click through and run with it. Similar to automation, where you invest time to build the framework and write test cases, the CI pipelines can then run tests frequently, providing value to the product and enabling multiple daily or weekly releases.

Cory: 20:49

Yeah, it's interesting that you pointed out that you kind of carry the bugs forward, right? Because I think one of the interesting things in GenAI around code is that it's one thing for GenAI to see an actual software bug, like an off-by-one error or a loop starting at one instead of zero, but it's different with business logic, which it doesn't necessarily understand. We've got these models trained on mountains of code, but they're not trained on the context between two git diffs. Or maybe something that looks like a bug is actually an aspect of the business, right?

I worked for an e-commerce company where you could only put one item in the shopping cart. In another e-commerce system, you might think something's broken, but that was the way the product worked. That's pretty interesting. Have you seen anything in the AI space that incorporates context, like the history of git commits or business requirements, to help generate better code? Not just the code, but also considering the business aspects. Is anyone working on that today, or have you seen any examples of it?

Amey: 22:18

Yeah, I mean, as I was referring to in the beginning, Google has its own world of infrastructure in that context. We have our own source control technology as well, our own Bitbucket equivalent or GitHub equivalent. So we do have a fair amount of data revolving around what looks like readable code and what looks like the style guide we would want. In case someone writes a particular piece of code, if it's efficient, we have comments on the PRs and CLs, so we do have other information. However, to your point, the information is a lot, and it is unlabeled. That's the whole point of LLM. So I think with time, we'll be able to refine it further and make it much more valuable.

This is a discussion right now. If you follow any of the tech papers people are submitting, it's about giving additional context. For example, when I am writing a test case for a particular piece of code, I also want to evaluate my test document and my PRD alongside the code because it requires all three in the mix to generate a meaningful test case. So I think it's not as good as we would like it to be today. But things have been changing at a pace that I've not seen before. There are so many minds on it, and just because of the access to infrastructure, time, and energy, I wouldn't be surprised if it's being solved live right now, somewhere, by someone figuring it out.

Cory: 24:02

Yeah. What do you think as far as traditional people that you'd see on a platform team—operations, software developers—with how fast AI is moving? What implications do you think there are for the skill sets of those traditional platform team members? And how should those engineers and teams prepare themselves for the changes coming with GenAI?

Amey: 24:28

Yeah, I think you're absolutely right. I think GenAI is definitely poised to reshape the landscape of platform engineering, and the skills required will need to evolve alongside it. I think coding, which has been a highlight for the last, I don't know, 20–25 years or even longer, I don't think will be the focus so much in the future.

I think the focus will be on problem framing. I think the cogeneration piece will get a lot more refined with time, so the premium will be on defining the problem effectively and then being able to understand what the clear instructions would be, given that problem and the output we want to get out of it.

So I think the human piece will still be required, but I think the mentality will need to shift from, would I be able to code it as compared to what needs to be done, assuming that the code will just happen and it won't replace human judgments? I think that overall judgment and the overall human-in-the loop aspect will still remain. And I think there will be more of a need to collaborate across disciplines. They will need to work with data scientists and domain experts. So you will need to have a much broader understanding of the overall problem at hand. And I think outside of tech, they will need to embrace continuous learning. I think the curiosity needs to be there and an open mind that there is something to be done differently, essentially shifting from the code-centric mindset to the system mindset.

There is, of course, value in upskilling yourself with data and AI fundamentals. I think there is a whole slew of new jargon and acronyms right now. I don't suggest, obviously, becoming an expert in all of them, but if you're a platform engineer and you don't know what CI is, I think that's a problem. So, very similarly, you need to know what drag is, and you need to understand what concepts around LLMs are also effectively needed.

Cory: 26:24

To start becoming familiar with the terms and potentially the operating models of those. But we don't necessarily have to go back to university for more math classes, right?

Amey: 26:35

No, absolutely not. Absolutely not. In fact, if I recall correctly, I think Sam Altman or even the folks in... I feel they... I mean, there are going to be people who would be needed to solve those kinds of problems, but not everyone's going to be building them. Not everyone needs to know how Git functions, but you need to know what Git does. So I think it's a very similar place. I mean, when I saw Git, it was magic for me that multi-GB-size repositories can be condensed into a single key. But I don't know how it did it. But I can still make a lot of progress from it.

Cory: 27:17

Yeah. Awesome. Well, on the other side of the world from AI are obviously the people that are building it, right? And the platform teams. So these teams are going to be around for quite a while, it sounds like. We're going to have to run these AI systems. We're going to be building software on top of them. One of the things I think about AI is that I feel like there's a lot of... I think it's probably going to make us more efficient engineers, and if anything, given corporations, progressing faster means more money for businesses. So I don't fear AI in the long run. I see it as a tool that I'll be using. But we do have all these teammates that we work with.

What's interesting about DevOps is that I'm not sure how familiar you are with the Dora report, but the Dora report over the past few years hasn't looked great. In the last Dora report, 50% of people who responded who were focused on DevOps were still struggling to get to that high level of maturity where you could start to do platform engineering and really reap the benefits.

You've managed high-performance teams throughout your career at VMware, Cisco, and Google. What are the most common challenges for platform teams at scale, and how do you address those problems to maintain productivity and morale? I feel like we're always drowning in tools to adopt, and now there's just a whole swath of them coming at us from the world of AI. I would love to know what the most common challenges are that you see in managing those teams.

Amey: 28:47

Yeah, I think I've got multiple answers, actually. I want to share on this that, yes, things are evolving really quickly, so there's lots of new stuff. I also look at that as a very positive sign. I always love if there is innovation or something new coming in the industry that I am in, which means people are interested and people are developing. I don't know if in the railworks or in coal and mining there's a lot of innovation happening, and you can imagine the industry stagnating. So I've always looked at that as a very positive sign for my industry that people are focusing on it and building stuff on top of it. It gives us an opportunity to learn and upskill ourselves.

To your point about people being nervous about losing jobs, I agree with you. I don't see GenAI taking our jobs. It is an enabler, making us empowered to solve higher-caliber problems that were not possible before. So it depends on how well we really take the challenge thrown our way. With the additional time that we are going to get from these tools, it will give us more creative windows for coming up with problems that we could not think about just because we were short on time, for example. So I think it's a mindset that anyone in tech or any industry actually should need to look at in terms of how, if there's any disruption coming, they should embrace that disruption and capitalize on it by adopting and upskilling themselves.

Within my teams, we definitely face future fatigue because of these new tools. People are like, "Hey, can you implement this? Can you build a tool for making that particular feature possible?" And there's always a new shiny feature or tool to build or to adopt. But how we combat it is by putting prioritization at the top. We are very ruthless with what we want to take on. As with almost every company, we have our own ratios. I'll not speak about Google, but the general industry number is essentially ten to one. For every ten developers, you have one platform engineer per se, so we need to be very mindful about what we're taking on to ensure that we are delivering forward. We also chunk our workload. We make sure that we're making small, incremental changes—something that adds value. People feel that they have things under control. We also celebrate those victories along the way—not just the big milestones, not the final deployment, but milestones along the way. And we are also not afraid to use the word "no." I think saying "no" is powerful if it doesn't align with the priorities.

We are very clear with our communication with our partners and our consumers about the solutions we build as to why we are not making a decision today or why we made a decision today to not build it. And then probably carving out, as I said, with the time and efficiency gains that we made, time for innovation so that people can take those new tools that are available, which we said no to, but also learn about them so that when the absolute need does arise, they are in a position to really jump off from.

So I think ruthless prioritization, small iterative work, feeling comfortable saying no, and protecting time for innovation would be key ways for any leader, their platform engineering team, or any tech team to embrace these kinds of disruptions in that particular space.

Cory: 32:26

Yeah, I think two things you said there that I think are pretty key are things like celebrating those milestones. I feel like so many teams get caught up in something we can't celebrate; we can't wait to celebrate the big win. And like, sometimes that win doesn't come right, but you might have gotten 80% of the way there and, like, not reflecting on those milestones where you have seen progress on the way to where you're trying to get to. I think that is really key for morale—keeping employees enthusiastic about what they're working on.

The really interesting thing you said there is no, and that is being able to say no. And I feel like I've seen so many teams struggle with that, especially platform teams. What's interesting is that when you look at, let's say, a team of 20 developers working on an e-commerce platform, there may be a number of people that are working on that platform who don't really care about what the product is selling. They like writing software. When you meet platform engineers, they always seem very excited about what they do. They're building software for software developers. It's their people that are a very hard group to say no to because you're already in a role that's very much a service role. You're working on a product that you're excited about with a customer that you truly understand.

What are some tips for being able to say no to a new feature or a new tool? And, like, how do you handle that internally at Google?

Amey: 33:56

Yeah, I mean, I think the onus of saying no comes both from the leader as well as the team that that leader is leading. I think there needs to be psychological safety within that group where people can have candid conversations about the value of a particular feature or product, either with their partners or with their users. So I think that is the number one thing that will allow for that outcome to begin with. But there need to be some points where we need to anchor ourselves to the goal that we are solving.

I mean, like almost every business and every team, we are working towards certain outcomes and certain goals, and say, for example, that there is a particular group within my organization that's focusing on making things a lot more reliable. If that doesn't fit the agenda of making that particular product reliable, probably for that particular sequence, it just doesn't fit. So defining clear north stars clears out the mind on how they can evaluate the problem at hand. And I think we also need to.

Sometimes I see platform engineers have the temptation to first find the how and then the why or what. And I think that always plays against them because they feel they've discovered, oh, you know, like, "Hey, I've got Terraform". And then let's say there is a new solution, which is, I don't know, a bottle. And they're like, "Oh, the bottle looks cooler. Let me implement a bottle and see where it all goes. Can I not use Terraform moving forward?" So that takes away from the North Star, the anchor that everyone's moving towards. So I think the psychological safety within the team to speak their mind and having clarity on the goal that we're working towards will make those kinds of decisions easier. Then from there, I think Google specifically, and even in my past companies, data should be fuel. I mean, curiosity is a spark that needs to consume that data and then come up with an outcome and use data alongside determining why this is adding or not adding enough value that the teams should be spending the amount of energy that they are on building that particular piece. So I think a combination of these three.

Cory: 36:15

Were you saying earlier with, like, almost like the new shiny bobbles that can kind of distract you from the work at hand? Like a new tool. Right.

Not sure if you follow Reddit, DevOps, or Subreddit, but it's funny. Like, I feel like every couple of days I see either somebody on one of two sides of the fence, somebody is like, We've got too many tools, we are using everything, and like, we can't, like, we just have so much stuff that we can't figure out how to do anything anymore. And then teams that are like, we haven't had any new tooling in forever, and I would love to be able to, like, start bringing in new tools into my organization, and I feel like it's funny. It's like seeing both of these complaints at the same time in the same form is always very funny to me.

I think one of the key things you kind of hit on there is not just that you can use the tool, but that the tool exists, and why and what value does it bring to the team? I think there are really key things there.

Amey: 37:08

Why and what value? And at that time specifically because sometimes, I mean, I think, let's say, for example, if someone's using typescript and they want to move to, let's say, Dart or JavaScript, maybe there is value in migrating it and maybe it should be done. But is it worth that migration today, given that everything's already pretty much running? And it may add, let's say, I don't know, x amount of productivity to the entire group, but what is the opportunity cost then that we actually let go?

So I think the timing piece of it digs into the opportunity cost that people probably sometimes don't evaluate as well. Where else would we be able to put in the time?

Cory: 37:50

Yeah. And I feel like opportunity cost is one of those things that's very hard to think about, right? Especially if you're not coming from a very business mindset, right? Like you're, you're working with these tools all day, day in, day out. You're building features like, Why not this one?

Being able to bring it back to that and like kind of embodying that idea with your team to make them be thinking about what is the opportunity cost of doing this right now, I think is really, really important to making sure that you're doing the right things for your product and that you're doing them in the correct order. So with teams like yours, especially on products like Google Ads that are high-revenue products, how do you foster a culture of innovation within these teams where you have something that's working well, it's generating revenue, and we want to continue to innovate it? We want to make it faster, we want to make it cooler, and we want to make it generate more revenue. How do you do innovation when you have a product like Google Ads that is just so high in usage and such a revenue driver for the business?

Amey: 39:01

Yes, Google Ads, because of the percentage of revenue it brings to Google and the data it works with and the number of engineers impacted, is a high-pressure group.

I always say this, though: innovation and pressure can be best friends and not sworn enemies. In fact, I feel many times that innovation comes from a high-pressure situation as well. So in terms of how we cultivate innovation and how we ensure that we are progressing along, I think the first mindset thing would be embracing calculated risks.

I feel there has to be a risk involved, but how much can you pad it to ensure that it's calculated? So, for example, do you have smart failures when things get deployed? And if it doesn't work, can it roll back automatically? Can it scale by itself?

So, creating a safe space to experiment conceptually and also technically allows for innovation and development to happen much more fluently within the Google Ads infrastructure. Thankfully, that kind of setup is there, so people feel comfortable making changes to the frequency and scale at which Google functions. And I think cross-pollination is key. I think innovation, or at least large innovations, happen at the cross-section of multiple disciplines.

So if the platform engineering teams do not work in silos, they connect with their partners, they connect with the data scientists, and then there are multiple minds looking at the problem from a different perspective. It ensures that the idea of the development or the project that is being carved out is a much more foolproof or thought out solution, which builds a lot more confidence when we are taking on something that is high-risk. There is also, as I was speaking about earlier, using data as well as having psychological safety to feel comfortable and say when things are failing. People are nervous sometimes to say, "Hey, I kind of messed up. It's actually not working out." People, you know, continue doing it because they don't want to look like they fail. And that's a recipe for a much bigger, much larger disaster.

So building the kind of culture within the group where people are comfortable and, in fact, eager to share their failures early on so that other people do not make the same mistakes also goes a long way when we are dealing with these high-revenue, high-pressure situations where information is shared quickly and frequently within the team members that are solving the problem, but also externally as well. And then embracing all of these aspects, I think, will essentially harden the team over a period of time. Then, as the organizations mature, the confidence rises, and then from there, these situations feel less pressure, and then it's business as usual from there.

So I would imagine a brand new team facing these challenges, and these are some of the techniques they could probably use. Hopefully, the attrition is slow and the team sticks together, you'll end up with a much stronger and more confident team, making a larger impact.

Cory: 42:19

Awesome. Thank you for that. How big is the Google Ads platform engineering team?

Amey: 42:24

I would say it's in the, I'm not sure I can be giving out exact numbers, but it's in the hundreds.

Cory: 42:30

Oh, wow. Just, just for the platform engineering team?

Amey: 42:33

Yeah

Cory: 42:34

Oh, very cool. Very cool.

Amey: 42:36

And that's a small section of it. I mean, you do have some specialized platforms that have their own specialized engineers for those platforms as well.

Cory: 42:46

Yeah. And so with your teams, let's say a team member finds a new tool that they think is a good fit, and it's the right time. Like, do individual team members have the ability to just kind of bring that in and start using it? Or do you have a formal, like, RFC process where your platform engineering teams kind of look at the tools before deciding to move forward?

Amey: 43:08

Yeah. Yeah. You know, as you ask this question, it reminds me of my first day at Google.

When I walked in, it felt less like company; it felt more like a university campus. And I was like, Hey, this already feels very different. I remember that. Why are people playing volleyball in the middle of the day? And, you know, it's got a very chill vibe. And I thought maybe it's just the non-work-related pieces that are very university-like. But as I got into the system and saw how Google as a company functioned, I was blown away by how grassrootsy it is. And actually, it's like it's a double-edged sword because we see similar competing products that Google pushes out. For example, we had Google Chat and Google Hangouts. Then we had Google Meet all solving nearly the same problem, but not quite a single unified product.

And we can compare that with Apple, where they have this grand ecosystem strategy where if something has to come out, a product that goes externally, it needs to be wetted across the entire ecosystem. Google clearly doesn't function that way. And if you apply that blueprint of how Google approaches external-facing products, you can see how much of an explosion it is when you look at internal products where there are notional users involved.

We do encourage a lot of our engineers to think, innovate, and solve problems that no one else has solved. And many times you'll see—actually, you'll see—duplication, but that's the risk we are willing to take as a company because we feel we'll find the diamond once in a while. That will be totally worth it. So engineers are encouraged to innovate. And now, to make that innovation really work, they need to put on multiple hats.

So a platform engineer in Google, or at least in Google Ads, essentially is like a CEO. They're like, Hey, I've got a great product idea, but it doesn't stop there. You need to figure out your strategy or how you want to get adopted because people are already using a similar technology and they want you to move to yours.

So why would a user switch? So you need to have a sales pitch, and you need to figure out your growth strategy. You figure out the people who contribute. So there is some kind of recruitment involved.

You can get 20 percent of the vote to participate. If you are not a manager who can fund the project, but you are really convinced of the idea, So we have an internal job portal where people are saying, Hey, I've got this great idea. I think it's going to have a big impact on the product, and I would need some help so people can be recruited for the 20% project. So it's a very unique company, at least from the companies that I've been in. I've worked for seven companies in the past.

I've never seen that kind of style where anyone from an L2 to L3 engineer, which is the entry-level engineer all the way up to the VP, feels empowered to come up with a solution to a problem that they feel is the best way to be solved, and then they just go do it.

Cory: 46:15

That is awesome. I love that. So, you know, with all of your experience in platforms, what I'd be curious about is that I feel like the definition of platform engineering is kind of starting to formalize over the past year or so, like the traditional definition of platform engineering.

What do you think is missing? That we need to start incorporating into platforms, whether they're at Google, whether they're platforms that we're buying like different passes, or whether they're the platforms that we're building internally at other organizations, on top of Kubernetes, or on top of the clouds directly. What is missing from platform engineering today that you think is going to be important in the future?

Amey: 46:54

Yeah, I mean, I think compared to the classic platform engineering, I feel it needs to, it needs to, it needs to understand that the platform isn't really just limited to code infrastructure deployment monitoring, at least within Google and VMware when I was part of it, mainly in Google, a developer probably spends, I don't know, I'm just throwing a number, I feel I'm in the range. So approximately 15% to 20% of active code authoring time. The amount that they're actively coding is like 2020, or 5%.

Everything beyond that could probably be conversations, discussions, or bug evaluations. There could be something on-call, like a debugging piece that needs to be done. There needs to be some kind of report of, let's say, an incident that happened in production, and you justify how that bug got to production.

So there's documentation writing; that's all that's required to make the product successful, which depends on an underlying platform. Now, the platform in this case suddenly explodes beyond an IDE, a source control manager, a Kubernetes, you know, like the system; it starts looking at, let's say, documentation, starts looking at approval systems, and starts looking at an issue tracker. And I feel if platform engineering teams look at their entire ecosystem of tools—all surfaces that teams use to make their product successful—that actually is the platform. So that is, I feel, something that they need to think about. Because if at all I get, if at all I get a page from one particular incident that happened in production and I'm responding to it. I need to be able to tie that incident to some kind of CL and figure out my test cases that ran around with them. I need to have a debugging journey over there.

Then, from there, I need to probably do code authoring, but the review process is over there. So you can imagine the number of surfaces that they have bounced across, and that context needs to keep flowing. So that ecosystem needs to be a little bit broader. That's one. And I think the second aspect would be when they are building solutions. This is on a very different tangent. I've noticed that the primary focus has always been either speed or quality. I can get you to do your stuff faster; I can make you do it with higher quality. And just because we feel that we can cut some corners, we don't focus on ease.

We are saying, “Okay, can I make this a better experience? We feel that the users are an internal engineer sitting right next to me. I think, let me just get what I have right now there. But they're not thinking from a true product mindset that, yeah, hey, it is not just about the speed; it's also about repeatability, you coming back, having a pleasurable experience so that you want to use the product more because if it's a really painful tool to use, you may not use it as frequently, and then eventually it beats the entire purpose.

So I think the additional dimension of thinking from an interaction perspective and an ease perspective will also make a platform engineering team a lot more successful. And I feel that's currently missing on many teams.

Cory: 50:16

And that's that bit right there where it's like your users sitting five desks down from you, right, like you have lunch with them, but you don't typically think of them as the customer that you never see that runs their credit card through when you're selling business. Right. And the way you present your product is probably very different in those two scenarios. But Devx has come, right? Like, I mean, you're going to get users internally on your systems; they're going to love or hate those systems, and if they love them, you're probably going to get a lot more use and a lot more useful feedback than if somebody's like, I don't want to use this thing.

Amey: 50:51

To the desk and saying, Hey, by the way, click here, click here. Just to get them onboarded really quickly. So, you know, you already realize that the onboarding experience is poor, and you kind of cut short certain aspects. So when there is someone who's not in the room, the process becomes really painful. So yeah, you can. You can see that mindset really play out when you do not know who your user is.

Cory: 51:11

You know, technical skills—what personal attributes do you believe have contributed most to your success in leading platform engineering teams?

Amey: 51:20

I think in general, curiosity. I think curiosity is something; I think it's fundamental to success, especially in these changing times. And curiosity in terms of why is this functioning the way it has been functioning for the longest time, or why do we need to do this particular thing, why is this particular trick needed, and why could it not be solved in the base aspect? So I think the why behind a lot of questions is something that people should use a lot more often and not take things for granted, you know, and not assume that, hey, this was built by the architect and the group, so it must have been a really thought-out solution. And this is what it is. So I think having the courage to ask a question and to be curious to begin with. And then finally, curiosity fuels empathy, because once you are curious, you go down that rabbit hole of figuring out why you connect with people, and then you empathize with their problems. Actually, I really had to do this really quick. And so I had taken this corner over there, or, hey, this particular server with this particular chip is not best designed to solve these kinds of problems. Problems. At least that was what the case was five years ago; if there is a new chip now, maybe the answer may be a little bit different. So you understand the thought process that maybe it's not a little bit. It's not only from the dimension that you're probably thinking about, but probably some other aspect that you haven't thought about. So I think curiosity and empathy, together, have helped me do my work as a platform engineering lead for a while.

Cory: 52:55

And that's interesting. That's really interesting because of, like, curiosity and empathy, right? Like that leads to a good culture, right? And I feel like that's what, when you see a lot of organizations that kind of traditionally struggle with DevOps, like, it's rooted in just not nailing that culture. And, like, not internally getting that initiative born and moved forward. And maybe we just need a lot more empathy to get there.

Amey: 53:24

I think once people start becoming a lot more empathetic, you'll see other behaviors come into play. Like, for example, some teams—I've seen this happen—are like, Oh, this is this. This doesn't come under my jurisdiction, or this is not my work. But as you start empathizing, then the work is not the discussion. The problem is the discussion. So you're like, okay, but it's still our problem. So you start tackling the problem, and that mindset shift happens then. And then, effectively, teams become a lot more effective.

Cory: 53:52

Yeah. Awesome. Well, we're coming up on time. I really appreciate having you on the show today. Just one last question for you: Where can people find you online?

Amey: 54:02

Yeah, I've been doing a poor job of keeping a good social presence compared to the number of people that are doing it nowadays. I think LinkedIn would be my best way to connect. I don't post a lot over there, but if at all, LinkedIn would be the best way.

Cory: 54:14

Awesome. Amey, thanks so much for coming on the show today. I really appreciated having you on, and I look forward to talking to you again in the future.

Amey: 54:22

Thanks a lot, Corey. I really had a great conversation. I really like the questions. Thank you. Thank you so much for taking the time.

Cory: 54:28

Yeah, thanks so much.

Outro: 54:34

Thank you for listening to this episode of the Platform Engineering Podcast. Have a topic you would love to learn more about? Let us know at cory at massdriver.cloud. That's C-O-R-Y at M-A-S-S-D-R-I-V-E-R dot cloud. Catch you on the next one.

Episode 8

12th Jun 2024

Navigating AI And Platform Engineering With Amey Patil

Transcript

Listen for free

About the Podcast