LIVIN' ON THE EDGE PODCAST

Gareth Rushgrove on Kubernetes Tooling, Platforms, and Engineering Security

Ambassador Labs · LOTE #15: Gareth Rushgrove on Kubernetes Tooling, Platforms, and Engineering Security

SUBSCRIBE:

About

In this episode of the Ambassador Livin’ on the Edge podcast, Gareth Rushgrove, Director of Product Management at Snyk, discusses the state of Kubernetes tooling, the role of application platforms and how they should be designed and managed, and the importance of engineering security.

Episode Guests

Gareth Rushgrove

Director of Product Management at Snyk

Gareth works remotely from Cambridge, UK, helping to build interesting tools for people to better manage infrastructure and applications. He currently works at Snyk, working on developer-first security tooling. He has previously worked for the UK Government Digital Service focused on infrastructure, operations and information security, as well as at Puppet and Docker. When not working he can be found curating the Devops Weekly newsletter, hiking or reading a good book

Be sure to check out the additional episodes of the "Livin' on the Edge" podcast.

Key takeaways from the podcast included:

APIs must be designed as user interfaces in order to both provide the most value to end users, and be easy to consume by developers.

Many organisations are currently achieving good results with using Kubernetes as a foundation for their platform and configuring deployments via YAML files.

For engineers working close to the K8s community, it is easy to believe that the developer experience and configurability can and should be “better”, but sometimes the simple approach (using YAML) can be very effective.

The use of cloud native buildpacks can provide a lot of value, especially when integrated seamlessly into languages-specific frameworks and workflows. For example, the latest Spring Boot releases include buildpack support, but they hide unnecessary configuration details away from users that want to use the simple defaults.

Micro-PaaSs such as Rancher Lab’s Rio are showing promise in providing “just enough” platform. They have potential to strike a good balance between developer-focused affordances and usability.

Treating an organisation’s platform as a product can provide advantages. Understanding the customers of a platform, identifying what is actually required (versus what would be interesting to build), and prioritisation of work are vitally important skills for platform product owners.

The Kubernetes-focused continuous delivery tooling space is embracing composability -- e.g. GitHub Actions, Argo workflows, the GitOps toolkit -- and although sharing of components provides a lot of value, the industry is still mostly at the “copy and paste” stage of development. Interesting standards are evolving in this space.

Infrastructure as code (IaC) and configuration code is not always treated the same way as application code within a CD pipeline, but it is important to apply the same best practices, such as linting, (static) security analysis, and acceptance testing.

Conftest helps engineers write tests against structured configuration data. Using Conftest you can write tests for your Kubernetes configuration, Tekton pipeline definitions, Terraform code, Serverless configs or any other config files. Conftest uses the Rego language from Open Policy Agent for writing the assertions.

It is important for all engineers to be aware of security. Scanning container images, application dependencies, and application code is table stakes.

Baking security checking processes into developer tools so that code and configuration are automatically and continuously analysed will provide the fast feedback that engineers require.

Transcript

Daniel (00:03):

Hello everyone. I'm Daniel Bryant and I'd like to welcome you to the Ambassador Livin' on the Edge Podcast. The show that focuses on all things related to cloud native platforms, creating effective developer workflows, and building modern APIs. Today I'm joined by Gareth Rushgrove, Director of Product Management at Snyk. Gareth is a deep thinker in the developer tooling space and I always enjoy chatting with him about the evolution of developer workflows, Docker, Kubernetes and platforms, and the role of standards within the software development industry.

Daniel (00:31):

Today I'm asking him to pick his brains on all things continuous delivery, Kubernetes tooling and security. I also wanted to understand what he thought about treating application platforms as products and managing accordingly, as I know he's learned a lot from his journey from engineer to product owner. If you like what you hear today, I definitely encourage you to pop over to our website. That's www.getambassador.io, where we have a range of articles, whitepapers and videos that provide more information for engineers working in the Kubernetes and cloud space. You can also find links there to our latest releases, such as the Ambassador Edge Stack, and also our open source ambassador API gateway, and our CNCF hosted Telepresence tool too.

Daniel (01:08):

So hi Gareth, welcome to the podcast. Thanks for joining us today.

Gareth (01:10):

Thanks for having us.

Daniel (01:11):

Could you briefly introduce yourself for the listeners please, and share a recent career highlight as well?

Gareth (01:16):

Yeah, so I'm Gareth Rushgrove. I'm currently one of the product directors at a company called Snyk. I'm basically security tooling for developers, so on the product side today. Previously I worked at Docker on the product side. Puppet on the engineering side. I worked for the UK government as well for a bunch of years as part of GDS, mainly doing operations infrastructure, security, wandering around causing trouble.

Daniel (01:42):

Nice.

Gareth (01:43):

I'm involved in a load of open source projects as well. Most recently, I guess, quite actively involved in the Open Policy Agent project. I run one of the sub-projects called Conftest. And I, in my increasingly vanishingly small spare time, send out the DevOps weekly newsletter, which I've done for 10 years, which is a long time to send an email every Sunday.

Daniel (02:06):

That is. I definitely appreciate it. I know thousands of people do as well, firm favorite of mine of the newsletters. Awesome stuff, Gareth. So the traditional first question in the podcast is, can you share your worst developer experience, or worst DevLoop? And when I mention that, I'm talking about the idea of having the idea, coding, testing, deploying, releasing, and verifying. You don't have to name names, but can you share the most horrendous experience you've bumped into?

Gareth (02:32):

Yeah. Luckily you give me a heads up about this and I had a think. It's going back a while but there was a, and I definitely won't name names, but there was a certain web service that a company, my employer at the time, was looking to integrate with. And we wanted to do basically a spike. We want it to go like, "Okay, how does this work? Let's build a prototype." Then say, "It does these things." An account of how would we integrate it. So trying to build something that was more throw away than anything else.

Gareth (02:59):

And it was back in the days when SOAP seemed like a good idea and I went, "Oh, that's fine." And then we were mainly writing Ruby. I'd done a bunch of Ruby, a bunch of languages at different jobs before and they said, "All right, yeah, we've got web service. It's all good. And you can just integrate with it."

Gareth (03:19):

And they sent some documentation." And I was like, "Woah, this is professional." And then they sent the WSDL files. So for those that haven't had the horror, SOAP was originally the Simple Object Access Protocol. They dropped the acronym because people kept pointing out that it was not simple.

Daniel (03:38):

Yes.

Gareth (03:38):

WSDL was an attempt to basically say, okay, SOAP is just for machines. WSDL can be this format on top, which you use to describe, so it was Web Service Description Language. And again, all of this was XML. They sent across a zip file with, I think somewhere in the region of 120 separate WSDL files to describe this API. And again, the idea with web services was none of this mattered. You just took those files. You just generated a bunch of code and then you could use it in any language you wanted. This was the idea. I mean, they didn't tell us what they used to write it in, they just sent us a bunch of WSDL files. I tried to use a bunch of take WSDL files, generate codes, type tools, and everything was no, this is crazy.

Gareth (04:29):

This is 120 separate WSDL files, hundreds and hundreds of meg of X and nothing was having it. And in the end I just went back to them and said, "What did you use to generate these?"

Gareth (04:43):

And they were like, "Oh, it was some Java framework."

Gareth (04:45):

And so I was like, "Okay, I'll use that."

Gareth (04:49):

So it was just all horrifying and actually what the API was doing was horrifying, how it was doing it was horrifying. It was very much just exposing obviously just a ginormous object set without really any affordances or documentation or understanding. It was like, "Oh no, we just clicked the export web services somewhere" and it was just horrifyingly painful and yeah."

Daniel (05:17):

I've definitely bumped into a few of these in my career, Gareth, and I think the deal with the API is you get an insight into the designer's brain, don't you? Like when I was a contracting company called Open Credo. We bumped into some horrible APIs that were purely designed by techies without the end user in mind. I'm sure I've done it as well right? But-

Gareth (05:30):

Yeah, again, I think this wasn't designed, this was this really just sort of the idea of API as implementation. And I think that's the, when people don't think of APIs as user interfaces.

Daniel (05:44):

Yeah.

Gareth (05:45):

It's like, "Oh no, I get an API for free from my object model." That's not true from a building a good API that's nice to use. And throw in an awful lot of XML and throw in this idea that you could do language interrupt via WSDL. And in practice, this only worked if you put the same tech on both sides for non-trivial cases.

Daniel (06:11):

I can definitely relate to this. I cut my teeth on SOAP. I've had many a horrible experience, but today I'm most keen to pick your brains around Kubernetes tooling. So we just said off mic, you and I regularly bump into each other at conferences. I always enjoy hearing what you're thinking is interesting and what your latest sort of hot take on many of these things are and unfortunately with conferences all being canceled or going virtual. It's not quite so easy to catch up. So I was really keen to, as a first question I guess, what do you think about the current state of Kubernetes tooling as it stands now?

Gareth (06:40):

I think there's been a lot of, I guess, things that have matured and new ideas springing up. There's still very much a, I guess, a search for a Holy grail, a bit around packaging or management or process bits. But I think a lot of that is happening, but there's maybe a better realization that actually a lot of users with Kubernetes are getting a lot done with the basics.

Daniel (07:07):

Yeah.

Gareth (07:07):

I think that's, it's always true if you're very close to something it's very easy to see the things that could improve. If you've been around the Kubernetes community for five years, you know what's moved and what hasn't, you can see these opportunities around high level obstructions. Actually they all also make things more complicated, assuming you need to learn all the layers. And actually a lot of people get on a long way with literally just the Kubernetes configuration files.

Gareth (07:36):

There's a lot of organizations that have adopted Kubernetes. They're just writing the raw configuration files and they're like, "No, this is great."

Gareth (07:42):

They're not comparing it with some hypothetical possibility. They're comparing it with really what they were doing before. And I think it's easy to lose sight of that jump when you're like, "No, but things could be better," and you need both of those two things. So I think there's a bunch of work going just on the good fundamentals, I guess. It's been great seeing like the evolution of the Buildpacks project-

Daniel (08:12):

I like that too, yeah.

Gareth (08:13):

As a good example. Again, I think there's always been this interesting potential there, but some of the story was maybe a whole told ahead of the technology. I think with Paketo the distribution of specific bell packs with... I don't think it's about telling people to use Pack.

Gareth (08:33):

I think I'm much more interested in it simply being built into high level frameworks. So Spring, if you generate a Spring project now it will generate you a maiden project that will build images, just using build packs. It's, no one needs to care about the implementation details. And I think things like that. It's not about building a better Docker build tool or getting cross at Docker file. It's about, no, you're not going to dislodge Docker file with another build syntax format file. That solves a really good problem. What other higher level problems can you solve and actually build packs as a route to that, I think, are really interesting. There's still a load of folks, I guess, working on deployment related tooling often basically building on top of things in Kubernetes with platforms. And so the work that K-Native has been doing, Rio from the Rancher folks.

Gareth (09:27):

It will be interesting to see how that space evolves. Jumping in now is very early. I think it's for people who want to geek out about the tools, but it will be interesting to see how things like that stand out. Even the work that docker is doing with simply just taking the Docker interface everyone uses anyway and simply saying, "Oh no, you can run that on..." Like ECS launched today, they talked about ACI as well. These sort of remote services. And again, that's not about your enterprise running large complex multiroute applications as much as. No, I just want to run some compute somewhere. I think that simple use cases are often missed in the rush to complex, bigger problems.

Daniel (10:17):

Yeah, make sort of sense. Yeah.

Gareth (10:19):

I think some of the hype has died away a bit. At one point maybe there was a, "Oh, it's all about coming up with new meta-languages and DSLs." And I think a bunch of that is hard work is being done in those spaces now without the "Oh, and you should use this." Like, I'm a big fan of Cue because I'm a DSL mode. And I think there's some really interesting tools that can be built on top of there. But shouting about Cue and telling everyone to jump towards using it is actually wrong for most users directly. It's the language, not the framework. And I think it's going to be the frameworks that come along next, that are going to be the most interesting parts.

Daniel (11:04):

Yeah. Right. I ask a lot of folks I chat to, both at my work at Datawire and also on the podcast, they're all looking to build a platform or buy a platform sometimes. And Heroku comes up as their classic example. I've used Heroku with Ruby on rails, loved it, made it super simple. Cloud Foundry, similar thing. I think I've seen you talk about the need to treat the platform as a product. If you're building a platform internally, you need to treat it as a product because I don't see that much. So could you explain to listeners perhaps what you mean by treat the platform as a product?

Gareth (11:34):

Yeah, I guess there's a number of different ways of looking at that, but I think one is what do you mean by product? And often it's the whole, it's something that actually is solving a collected set of problems, but also frankly, something that you're buying. Differentiating there from basically a project. It's not just about the technology. It's about the whole wrapper around that. And I think that that idea of it being something that development teams are buying, now, in some cases they might be forced to buy it. I think that's true sometimes when you're working with an external software vendor as well, throughout the DSU, you're paying for it but you're forced to use that one tool, that one issue tracker or that one source control system by some higher level power.

Daniel (12:22):

Yeah.

Gareth (12:22):

Or you have choice. And I think some organizations basically say, yeah, there's a happy path.

Gareth (12:31):

Like, "Look, we've solved all these problems. You get all this for free." Or you have to deal with it all yourself. By the way, what we mean there is you need to deal with monitoring and logging and metrics and everything else. We have standards you need to meet but you can knock yourself out or have them for free as part of our platform. So I think some of that is about doing a sales talk to your developers, actually having the platform, not as the, you have to use it whether you like it or not, because there's not a lot of incentive to make that good then. It's like, if developers have choice to use your internal thing or not, then you need to get good at selling it and you need to get good at talking to them about what their problems are.

Gareth (13:09):

And I think it forces good practice around prioritization. So I moved from mainly working on the engineering side to now mainly working in product roles. And I think a lot of people think that product and product roles and products are all about the ideas. Actually, if you're deep into an area and that might be building developer tools or Kubernetes platforms or whatever. Ideas are easy. Even, if you're deep in it, you could probably come up with an awful lot of them off the top of your head. If you talk to a bunch of users, which you should do, you'll come up with even more.

Gareth (13:46):

And then you go, "Wait a minute. Unless the answer is, someone gives me a hundred people to do all these things in a short space of time, I have to prioritize." And I think that the forcing function of prioritizing problems, not building some mythical, all singing, all dancing, do everything. It's like, what problems are you trying to solve? And different organizations, I think their struggle with different problems for different reasons. And I think that treating it as a platform to me is often about go find your users. Find out what they're... Not saying, "Do you want X?"

Daniel (14:23):

Yes.

Gareth (14:24):

The classic product management thing of if I go to one of our customers, for example, and say, "Oh, would you like Snyk to do X?" The answer will be yes irrespective of X, pretty much.

Daniel (14:37):

Yeah. Yeah.

Gareth (14:37):

Unless it's make tea or coffee, they'll be like, "Well that would be weird." But everyone likes stuff.

Daniel (14:46):

Yep. Yep.

Gareth (14:47):

But there's no tension there. So saying, "Here are two things, which one do you want?" And they'll be like, "I want both." It's like, "You get one which one you want." And introducing scarcity, forcing prioritization, I think, is a big part of treating things as product. And I think when you're building up a platform, that's really important because I've seen some people talk about just the upfront cost of building out that sort of platform and say, "Well, yeah, look, this is many people over many months. How'd you get value quicker?" That's not, "People, work quicker." That's not necessarily in lots of organizations pile more money in upfront. I think with that comes okay, what are the problems we can solve without getting too carried away with talking about a platform?

Gareth (15:37):

You can oversell things as well. I think if you treat building out your internal platform as a project that starts and ends, it's very easy to basically discover that you've spent six months with six people and you're not at the end. And everyone's like, "Well, no, we don't have money for you to spend three years building something." Versus saying, "Okay, our job is to serve that audience. Yes, there's some minimum viable thing for certain types of service but what are we trying to do."

Gareth (16:09):

So I think having people thinking about the product side of platforms for internal teams is really useful.

Daniel (16:15):

I like that a lot Gareth. I was chatting to Matthew Skelton and Manuel Pais who have got Team Topologies book, fantastic book to recommend to listeners, but they talk a lot about the minimum viable platform. Yeah. If it's an EC2 instance and deploying some code onto it, you're good to go. Yeah? And that is somewhat, in enterprise organizations, almost heretical. We've got to have the all singing, the all dancing, right?

Gareth (16:37):

Yeah. I think that's always that thing of the closer you are to something, the more you see the downsides and also the potential. That was probably the thing that drew my interest to going Kubernetes really early, like five years ago, when it came out, was the API.

Daniel (16:54):

Yes.

Gareth (16:56):

It wasn't the, "Ah, I can build a bunch of applications for this right now, or I can solve these concrete problems with it right now. I only had some of those. It was the potential around the API and the potential for the API to both be consistent at a low level so you can have that access, but also for us to build high level abstractions. I think we're starting to see some of that potential become reality now. But yeah, why are you interested in something? And appreciating that the reason might not mean you should use it straight away. You should interested in it straight away.

Daniel (17:36):

Very nice Gareth, yeah. So trying to tie together some of the ideas around platform as a product and also something you mentioned earlier around deployment being an interesting area at the moment. So I'm seeing a lot of folks almost creating a composable continuous delivery frameworks. So you've got GitHub actions. I see you regularly tweeting about that. You've got Argo. I was chatting to Stefan from the Weaveworks team a while back and they're doing this GitOps toolkit, which is super interesting I think. There seems this notion of composability is becoming a thing. Do you think that's where the future of continuous delivery is? It's going to be a sort of pick and assemble components as you want?

Gareth (18:07):

I think what's happened with CI configuration, also other types of configuration, application configuration, software packaging, security to a degree. A bunch of this responsibility is shifting to application teams. So before you just wrote some code and other people dealt with it, now we're seeing, you write the code, you write the CI config, you write the packaging with Docker files and whatnot. You write the configuration that deploys it. Again that used to be someone else's job. And all of this is in view of you build it, you run it, that sort of principle at a high level of it's better to have multiple people doing things, but decouple them. Versus everything coupled on something that slows you down. So I think with that happening you're then seeing CI config go from one team, bunch of experts to a bunch of non-experts at the edges. And that's nicely decoupled, these teams go at different paces.

Gareth (19:13):

On the other hand, some of them do it really well. And some of them do it really badly. Because it's hard. It's a new domain. Everyone can copy and paste the hello world example. But nearly no one, I mean people don't talk about refactoring their CI configs very often. You just copy and paste something that works and start using it. And the hello world examples, again, these are all the same problems you see in software code.

Daniel (19:39):

Yeah.

Gareth (19:39):

So take SQL databases. People go like, "Oh wow. Yeah, I can query the SQL database. I've got this query. I've learned enough SQL to get this thing. Great."

Gareth (19:50):

Did you add indexes? Is it performance? "I don't know." The answer is no it's not because the performance index version was more complicated than the hello world version and you didn't get there yet.

Gareth (20:00):

You need to get that yet. Look at all GitHub actions out there that are doing non-cached like pulls, like the amount of stuff being pulled down that you don't need to do. You're testing the same thing. Stop it. Just cache it. I think, so there's all these patterns there as well. And so, yeah, I think that starts with sharing ecosystems. I'd love to see more composability and sharing, but you sometimes then end up with things becoming centralized. It is all... Do you put the smarts in the middle and build things to distribute or do you actually just make people better at the edges. And the answer is a bit of both. So I think that unfortunately as well, a lot of this work is very implementation specific. So you mentioned Argo, GitHub actions, Tekton. I maintain a couple of GitHub actions, Tekton pipeline tasks for a couple of objects, Conftest I mentioned before.

Gareth (21:05):

One thing is very similar. They're annoyingly similar.

Daniel (21:09):

Oh, interesting, right. Between the two formats.

Gareth (21:16):

There's not an affordance advantage. There's not a model advantage. They're basically just the same. They're basically parallel development and different words on the same things.

Daniel (21:28):

Yeah. Right.

Gareth (21:30):

And without that push towards standards, you end up with this sort of, yes, but we basically just evolved but repeat a lot of work in separate ecosystems of no additional value. Well, no additional value to end users. There's value from the point of view of the-

Daniel (21:47):

Value capture.

Gareth (21:48):

Yeah basically from the platform capture side.

Daniel (21:51):

Yeah.

Gareth (21:52):

So I'm not sure that gets fixed, but I'd love to see more standards emerge there. But again, that takes time. It's hard to come about. The Tekton project is probably the folks who've been talking most about that.

Gareth (22:05):

Again, they're not as much trying to build the whole, they're trying to build a stack that other people can more easily build these types of tools on top of.

Daniel (22:13):

Jenkins X used Tekton for example.

Gareth (22:14):

Yeah, Jenkins X uses it, as does Relay from the puppet folks.

Daniel (22:19):

Of course, did a blog post with them recently, yeah.

Gareth (22:20):

There's some really interesting integrations going on around the knative sort of space. So yeah, they're building primitives for you to build these bits and I think there's some conversations going on there around standardized descriptions that can be used across multiple implementations. The GitLab folks actually more recently started talking about this problem of multiple CI systems and things. So I think I'd like to see something. I'm not sure it will happen that quickly, but I definitely think there's a sharing thing, but sharing starts, for the most part, by copy and paste.

Daniel (22:56):

Yes.

Gareth (22:57):

And we are the copy and paste... So I think we've moved from the handwritten, just hand write everything. And we've moved to the copy and paste stage of sharing. The question is can we move to something more... Like if you think of the evolution of probably languages, I started out writing a load of PHP and frankly you started, you just wrote it all from scratch. Then came along PasteBins and Stack Overflow and other things, you moved into this-

Daniel (23:28):

Yeah. Yeah.

Gareth (23:28):

And then came along package managers. And I flipped over to Ruby and Python which I suppose had started with more of a package management concept.

Daniel (23:37):

Yeah. Yeah.

Gareth (23:38):

The point when Ruby became popular Gems was there.

Daniel (23:41):

Big deal, right? Yeah.

Gareth (23:42):

The point when JavaScript became popular, Npm was not there. When PHP became popular, composer was not there. It's easy to forget all those things didn't always exist.

Daniel (23:56):

Yeah.

Gareth (23:56):

So I think there's slightly more than that, I mean, it's like GitHub Actions, you can reference actions from other repos and other bits and pieces, but it's not quite at the stage where there's the same non tool specific package concepts. It would be interesting to see if that is the direction configuration goes. So from a composability and reuse perspective, I think config is definitely there. There's a bunch of things to learn from the configuration management space.

Daniel (24:20):

Yes.

Gareth (24:21):

And what happened with the Puppet Forge? What happened with Ansible Galaxy? What happens with Chef supermarket? Again, both good and bad. I mean fundamentally they collected a bunch of content that was reusable and reused by a lot of people. But I think if you talk to them, I was a puppet. I know the other folks really well, they all ran into certain limiting factors when it came to sharing of highly abstract configuration.

Gareth (24:48):

So there's some interesting challenges around just applying the same sort of packet management approaches to configuration in general. But there's a lot of value there around CI. I guess actually I mentioned Build Packs and Dockerfile is an interesting example. Dockerfile is super-valuable, Build Packs basically take a bunch of I guess easy path options for certain specific, but very common setups. So yeah, you've got a basic Spring app. Well, yeah the Build Pack will work until it doesn't and then you can flip to something like Dockerfile. Question is can we do that for CIM setups as well? So rather than trying to solve them from the CI side, generically, can you solve them from the, no, I get to be opinionated about the framework. So let's solve it for Spring. Let's solve it for rails. Let's solve it for Django.

Gareth (25:45):

I think there's maybe a infrastructure person that comes at of it from a, no, I want to solve a CI problem. It's like, no, let's go solve, back to that platform product, as a product person, I want to solve the end user problem. And the end user is a developer and the end user is not a developer. They're a .net core developer using C sharp. Okay, let's go solve that very specific problem. And I feel like we can often flip a bunch of these things around to say, "Well let's solve the specific problem that individual developers have," rather than solving the I'm a CI nerd problem because we're talking about a world where we've shifted that out, where we're providing a platform to developers. And that doesn't mean all developers in lots of cases. It means the developers I'm working with.

Daniel (26:38):

Yeah, I like that Gareth. No, I like that a lot.

Daniel (26:40):

I'm keen to, in the final few minutes we've got here, to switch gears a little bit, because I really would appreciate your take on some of the security topics. Because with your background and also your work you're doing at Snyk, security is front and center. The folks I checked on a daily basis are totally aware of things like image scanning, totally aware of things like cluster access, RBAC in particular pops up. But that sometimes is the extent of the threat model I hear about. What do you think engineers should focus on in addition to those kinds of things?

Gareth (27:08):

Yeah. And I think touching on some of the topics we've just being discussing as well. A really common way of, you see it in the wild with things like the Capital One incident. They're not the first party applications that they're building and, yes, there can be insecure, yes, they can be compromised. But one of the issues there is that often ends up being quite bespoke. Your application was bespoke, so therefore the attack is somewhat bespoke and yeah, you're looking for commonalities like SQL injection at scale and whatever, but actually there's lots of other common ground that is outside your first party applications in terms of how you've configured your client environment, as a good example. We've shifted a bunch of this from being someone's specialist's job again, to development teams. Development teams are writing Terraform code, they're writing CDK or CloudFormation.

Gareth (28:01):

They're writing Kubernetes configuration files. And all of that allows you to stand up and provision and configure infrastructure at scale really quickly. It also allows you to make mistakes really quickly. And I think often we treat configuration differently to code.

Daniel (28:23):

Yeah.

Gareth (28:24):

In that we treat it as something like, "Oh no, that's already a solved problem. We just go from config to like, the deployment just happens." Versus code goes through this battery of tests and we'll test for, we'll run unit tests, we'll run acceptance tests. We might run some dynamic security scanning. We will run a battery of things against it saying, "Should we deploy this?" And your configuration often goes, "Is it valid? Ship it?" And again, I think it make sense in terms of we've just had a lot longer to, I guess these shifts. I mean, some of them from a code standpoint, well we have good mature testing tooling, linting tooling, static analysis tooling.

Gareth (29:07):

Because developers have been writing that code for a long time, those tools have become part of the practice. Should they be used by your very early first program? Yeah. Should professional programmers be using most of those things and do professional programmers know about them and not have an excuse that I didn't know about that? Absolutely. Software packaging. And you mentioned image scanning. Software packaging is something that again, needs to be in the ops domain. Definitely with Docker file shifted to be something more of a developer domain problem. And yeah, the first time people were using Docker file, they weren't scanning their images.

Daniel (29:44):

Oh yes.

Gareth (29:44):

But that's become much more common practice now, again with tools like Snyk making it easier, and the rest, but again knowledge of that now I would say is there, and the adoption is increasing.

Gareth (30:00):

So maybe people aren't doing it yet, but they're not doing it yet because they don't know about it. They're like, "Oh yeah, we haven't prioritized that yet." But I think configuration is something that has shifted to developers more recently. So Docker file over a long period of time, we're just at an earlier stage with the shift of all of this configuration stuff to developers and the tooling exists on the edges. But it's generally speaking, being those tooling nerds like me who work on things like Conftest. And it's not at the point where basically most developers know about the existence of those tools or about the existence necessarily of the problems.

Daniel (30:43):

Yeah.

Gareth (30:43):

So when you point out the problems, they're like, "Oh right. Yeah. What do I do?" You then might be like, "Okay, well what tools exist?" You also then start finding similar patterns like static analysis and linting and unit testing and acceptance testing.

Gareth (30:55):

Well yeah, it turns out you can apply the same things, but the tooling isn't as mature. And so sometimes the answer is we're not going to do this yet because it's high barrier to entry, it's a large cost for what we decide is... And ultimately that's direction of travel, we just make that easier to adopt. We make it more well known and more well understood. And I think that configuration testing space is sort of interesting, both from a, is this going to work, but also from a, is this, when I ship it, going to cost me a load of money. Like cost analysis is often done after the fact. So you take some of these tools and they tell you after you've spent a thousand dollars as opposed to going like, "This will spend a thousand dollars a day," and then you go, "Wow, great. It just saved me spending a thousand dollars a day."

Daniel (31:51):

Yeah.

Gareth (31:51):

You spent a thousand dollars first, and then it told you afterwards, it's basically a monitoring system, imagine tools that can shift that to you wrote this code and it goes, "Yeah, that's going to cost you this." And that's flagged in my tests because my tests said, "This should never cost more than $10." And then you look and go, "Oh right. I see what I did there."

Gareth (32:19):

Security, I think is similar. There's actually good understanding of what good cloud provision environments look like when CIS, with the benchmarks work that they've done, just community good practice. What does good look like? But there's generally not a load of taking that from those domain experts and turning it into tools that developers can then adopt. It's still, there are tools for specialists.

Gareth (32:43):

There are tools for after the fact. Very little of it shifts to being something that a developer has feedback on.

Daniel (32:51):

Interesting.

Gareth (32:53):

I think it's that feedback that is really interesting and useful. And you see for lots of tools. Lots of tools start as specialist tools used by a few people, used in the sort of batch slow cycle back to fixing things. Like testing, software testing.

Daniel (33:07):

Yeah.

Gareth (33:08):

That was someone else did software testing. They had specialist tools and you found out a week later that your test had failed and then you had to fix something. And someone said what if we write tests as part of the code? And we get feedback from all our writing?

Daniel (33:21):

Yeah, fast feedback

Gareth (33:21):

It's easy to go, "Yeah, that's obvious," that wasn't obvious, that was revolutionary. And I think that why it was revolutionary was fundamentally because the feedback cycle suddenly turned from something was long to something that was really tight.

Gareth (33:39):

And I think that configuration is at that point where we shifted a bunch of it to developers, but we haven't yet built the tools that provide the really tight feedback cycle for iterating on that config. So yeah, definitely something I am interested in generally. I think there's a bunch of interesting academic stuff there as well around config testing, but there's loads of also low hanging fruit. So we're doing a bunch of work now with Snyk where, as well as importing application projects and detecting Java and JavaScript and Python vulnerabilities, if you import repos with Helm charts and Kubernetes content files, and soon Terraform files, we'll flag up a bunch of problems there. And again, that's all in the source code end, it's all up the developer domain.

Daniel (34:26):

Like accidentally exposing ports or doing something daft, which we've all done right? It would pick that up.

Gareth (34:31):

Right. Some of it's, like you say it's daft, but often it's we've been given these really powerful tools and not always with knowledge of best practices and again, that copy and paste problem.

Gareth (34:43):

And I mean, there's always been this question of should hello world examples have all the security features on?

Daniel (34:50):

Yeah.

Gareth (34:50):

And from a security says no standpoint, the answer has always been yes. And they'd be like, "Let's delete all the hello world examples on the internet and make sure they're all secure."

Gareth (35:02):

And the obvious answer is "No, because then no one would lend to program."

Daniel (35:04):

Yes.

Gareth (35:04):

And democratization of programming is probably more important. Sorry. But that leaves you with this challenge of okay, so you've got people who can program and in this case it might be with Kubernetes configs or Terraform code. How do you take them from there to programming securely? And I think again, tooling helps that. We see that with how do you take someone, and education helps there, and talking about these things help there. Because we've seen that get much better on the professional Java developer side or the professional JavaScript side or wherever it might be.

Daniel (35:42):

Fantastic Gareth, thank you. I'm conscious of time now. Just very quickly, what are you looking forward to over the next 12 months? What's interesting in your world?

Gareth (35:48):

Well, we were just talking about events. It would definitely be nice to meet up with a bunch of people in person in different places.

Daniel (35:56):

In real life right?

Gareth (35:58):

And I think that probably goes without saying, but yeah, I think that will definitely be a highlight. I think the first few conferences and events in our space back will be interesting from the point of view of actually there's always that mix of content and topics and the people. I think those first few events are going to weight towards the people side, quite heavily.

Daniel (36:21):

Just catching up right.

Gareth (36:22):

Yeah and that human contact is important. It's easy to say everyone knew that, I think there's going to be a renewed understanding of that.

Daniel (36:32):

Well said Gareth, I'm totally looking forward to catching up with yourself and a bunch of other folks actually in person, high bandwidth communication. Very much looking forward to it. Thanks for your time today Gareth. As usual, fantastic amount of knowledge has been dropped there. Really appreciate your time today.

Gareth (36:44):

Yeah. Always good to chat.

Gareth Rushgrove on Kubernetes Tooling, Platforms, and Engineering Security

About

Episode Guests

Key takeaways from the podcast included:

Featured Episodes

S3 Ep14: Four P's of Platform Engineering: Key to Prosperity

S3 Ep16: Cutting Costs with Observability - Beyond Monitoring Best Practices

S4 Ep1: Overcoming Cloud Challenges: Exploring the Future of Cloud Computing