Michael Hausenblas on Customer Focus, Building App Platforms, and Kubernetes Access Control
About
Episode Guests
Be sure to check out the additional episodes of the "Livin' on the Edge" podcast.
Key takeaways from the podcast included:
- A well-integrated developer experience that allows engineers to code, test, and verify across a range of environments from local to production allows for business hypotheses to be tested rapidly and safely.
- Engineers should seek to understand the essentials of the business context in which they work. Developing knowledge of key performance indicators (KPIs) and business constraints allows engineers to design appropriately and to instrument their applications effectively.
- Developing a focus on the customer provides many benefits. The scalability and performance of systems is vitally important, but designing this should not be done at the expense of meeting customer needs in a timely fashion.
- The focus on providing value to “customers” does not only apply to end-users; the customer can also be your fellow engineer or business analyst. If you are a platform engineer, your customers are the application engineers and QA teams.
- Frameworks like Kubernetes have exposed developers to many operational concepts. Most developers these days understand ingress, networking, and application runtime lifecycles to some degree.
- Many developers want a simple Heroku-like platform-as-a-service (PaaS) experience for delivering applications. Kubernetes can provide the foundation for this, but the majority of teams adopting this technology create additional tooling to support the concept of deliverying “applications” (a higher-level concept than Pods or Services).
- Managing the software supply chain is vitally important. Code, dependencies, and deployment artifacts should be scanned and verified as secure. Using approaches such as RBAC with Kubernetes also provides control and auditing capabilities.
- The Open Policy Agent (OPA) framework is an interesting and flexible solution to defining fine-grained security policies, and when combined with technology such as the OPA Gatekeeper, this can be used to augment RBAC.
- Open standards provide good abstractions and integration points that support interoperability between systems and tools. Judicious use of these can lead to the building of extensible and flexible systems that can be scaled effectively.
- Future developer tooling will most likely focus on a function-as-a-service (FaaS)-like experience, such as AWS Lambda, and provide easy integration with other cloud service building blocks, such as machine learning (ML) APIs.
Transcript
Daniel (00:03):
Hello everyone. I'm Daniel Bryant. I'd like to welcome you to the Ambassador Livin' on the Edge podcast. The show that focuses on all things related to cloud-native platforms, creating effective developer workflows and building modern APIs.
Daniel (00:14):
Today I'm joined by Michael Hausenblas, a product developer advocate in the AWS container service team. I've chatted with Michael in the community for many years now from days when we were both working on Apache Mesos-powered systems, all the way through to us now working on Kubernetes in the cloud world.
Daniel (00:28):
Today I was keen to pick Michael's brains around the topics of customer focus, that's internal customers as well as end users. Also, I was keen to understand his thoughts around building PaaSs like platforms on Kubernetes and the use of RBAC an Open Policy Agent to implement Kubernetes' access control.
Daniel (00:43):
If you like what you hear today, I definitely encourage you to pop over to our website. That's www.getambassador.io, where we have a range of articles, white papers and videos that provide more information for engineers working in the Kubernetes and client space. You can also find links there to our latest releases, such as the Ambassador Edge Stack, our open source Ambassador API gateway, and also our CNCF hosted telepresence communities tool, too. Hello, Michael, and welcome to the podcast.
Michael (01:07):
Hello there. Thanks for having me, Daniel.
Daniel (01:10):
Could you briefly introduce yourself, please, and share a recent career highlight?
Michael (01:14):
All right, I'll do that. So my name is Michael, and I'm a product developer advocate at Amazon and working in the AWS service team for containers or kinds of computer services, their UCS, EKS, ECR, App Mesh. Well, there's so many highlights, it's really hard to pick one, but I think the highlight in the sense of the community is making really great progress towards being online and all the events and all the meetings and so on are moving online, and I hope that this kind of stays like that to be honest, from my perspective.
Daniel (01:55):
Awesome stuff. So I've got to ask the obligatory question with every Ambassador podcast we do, Michael. Could you share with us your best developer experience or maybe the key components you think make a good developer experience, and sort of from the idea stage, to the coding, to the testing to the deploying, releasing, observing, what do you think is the most important thing, perhaps folks aren't talking about today?
Michael (02:16):
Right. This is an example that's, believe it or not, is already more than 20 years old. And I picked it not because I don't want to talk about more recent things, but it really stuck with me, and I'm still looking for this kind of experience. Back then it was called silver stream. And I think it was bought by NetApp.
Michael (02:38):
It was essentially, if you remembered the time when we started building web apps. And it was this kind of, should it be written in JavaScript or in Java, like a Java applet? And so what they did there was really end to end, very nicely integrated, so you essentially would highlight or draw your UI, your forums and whatever. And then you would also model your racial database there, it would connect these fields, and then would be able to say deployed as a JavaScript based application or as a Java application or Java applet.
Michael (03:13):
And it has all these elements that it's just like full stack without claiming to be full stack. You have full control over how to actually deploy it. So do you want it this way or that way? You didn't have to upfront choose. And you could do everything in terms of testing, and so on, stages, have a preview how it would work, iterate on that. And I think the main point there was that it would allow you to focus on the business logic rather than deep diving into JavaScript or each database, or this or that. It was really this end to end integrated system that allowed you to very, very quickly iterate and be very, very productive in getting up applications. And I mean, again, think of it. It was like 20 years in the past, better than, 1999 or even earlier, and this is something where even nowadays we have all these advanced, in terms of libraries and whatnot, but still this kind of integrated end to end development environment that allows it to quickly iterate. I haven't seen that. Maybe I'm missing that stuff because I'm so much focused on the infrastructure bits that I don't see all these beautiful things that have been on top of it, but I remember I was really impressed.
Daniel (04:35):
I like that. I've noticed in my career there's been a fragmentation, effectively. We've all gone hyper-specialized. The full stack stuff you mentioned is still super popular, but the actual components of how we deliver software has become specialized. Infrastructure here, front end here, back end here for some reason. Kind of weird, right?
Michael (04:51):
Right.
Daniel (04:52):
How important do you think it is for developers to understand the business context they're working in? And my pitch here really is there's actually quite a lot to learn these days for full stack. Yeah. There's business awareness, there's coding. And then there's old infrastructure stuff, which you and I have worked on for quite some time, now. How important do you think it is for folks to balance up all their knowledge of business, dev and infrastructure?
Michael (05:14):
If you, as a developer understand business as your customer, and you want to make your customer happy, then it's clear your drive is really give business something that works rapidly, that fulfills their requirements, et cetera. So the more I understand, not necessarily the business itself, but the business needs that can translate those two to certain features or whatever, the better I can actually serve my customers, the more happy my customers are. And this is where, if we look, it doesn't matter if it's a conference or a white paper or whatever, where we are celebrating, and I mean, all of that, like engineering excellence, how nifty we came up with a protocol, or how scalable something is, and that is all super important, right? I mean, absolutely it has to scale. It has to be secure. It has to be performance, et cetera.
Michael (06:06):
But at the end of the day, if it doesn't work, if it doesn't fulfill the needs of business, then what good is it, right? If I can scale a trivial report or whatever. Yeah. It's very trivial to scale something that doesn't do anything. Then you report that I've scaled it up to 50,000, 100,000 instances, that's true. But you know, how many customers have you actually served? How much business do you actually support with that?
Michael (06:34):
And that is where business metrics, and that's, for example, a concrete thing. If you think about monitoring or whatever, and you come up with metrics, like let's say the CPU utilization of a box, right? Let's say 80%. What does it really tell you? Versus a business metric that says, well, in this application, might be a distributed application, we have done 50,000 transactions per second.
Michael (07:02):
Not only is this high level, business oriented metric much more meaningful, it has a direct impact. It tells you, well, you can attach a dollar value or whatever onto it. I can say, how much was my three lines of code that I edit there, or very often removed there, how much did that impact the actual revenue or bottom line? How much easier it is to talk about actual revenue or whatever, like customers or however you want to measure it in terms of business metrics, than focusing on that kind of stuff, rather than, yes, we have shaved off five milliseconds there. Again, don't get me wrong. I'm not saying that it's not important to be fast and performance or whatever, but put it into context of how does that impact your customer, how much more happy they are or how much faster they can do things or whatever. That is my main ask, I guess, or my main vision.
Daniel (07:59):
Yeah. Well, Michael, I think their customer obsession, I've seen the customer focus with Amazon always comes through, and I respect that a lot. And it just, it makes sense, right?
Michael (08:06):
Just to make sure, customer, not necessarily only means the end customer. If I'm working with someone, colleagues, they can be my customers. It's really all about understanding that someone else is not just a colleague or whatever. They are my customers. And that's the kind of mindset.
Daniel (08:26):
Yeah. I think a lot of it's about empathy, as well, isn't it?
Michael (08:27):
Yes.
Daniel (08:28):
Like sort of realizing that you're not the customers, even as us as developers, we're not always developing stuff for us. We have to bear in mind what other folks want, yeah? I like it a lot.
Daniel (08:37):
So going back to stuff about you and I did a bit more of now in terms of the infrastructure, we've set the context, and the business problem and the customer is super, super important. But we love our tech, yeah? And I know you've got some great experience here. So kind of keen to dive into that. How important is it for the average developer, do you think, to become operationally aware?
Michael (08:55):
I think for every developer, no matter if you're more focusing front end or more back end, more system related, more application related, or if you're on the full end to end stack there, it is super important to be at least aware. I'm not necessarily saying that you have to have hands on experience. It's great if you get that opportunity, but don't worry if you don't yet. But at least being aware, because A, it's much, much easier, it's much smoother to work together, if you, again, it comes back to empathy, comes back to being aware of what is going on on the other side. And unfortunately it's still the case, I think, I don't have hard data to back it up, but I remember a really good talk. It was someone who made the point that we are incentivizing and focusing on creating new stuff, which is what developers do. Right? You crunch out new features and whatever, and not necessarily keeping stuff running. And that is what typically the ops role is about.
Michael (10:03):
Whereas really, if you write something, and that's not deployed and it's not running, it doesn't actually do anything good. On its own, writing a new feature, whatever it is, it's really no good. And I think that's in terms of incentives, that's there. If you align these incentives, and I think ops roles already did that way more than dev roles in terms of ops roles picking up program languages, understanding what developers are doing, and then so on. And developers nowadays with things like Kubernetes for example, which I think contributed a lot to this kind of democratization, and getting operational contexts and concepts out there in the first place, making pretty much everyone aware of, what is a load balancer? What is a deployment? I mean, if you would've asked 20 years ago, your average developer, what is a deployment? I'm not even sure if they would have heard about it. Nowadays, they have at least probably come across a Kubernetes deployment. They might not know the details, but they know the concept.
Michael (11:01):
And I think all this education and being aware of what these concepts are, in the long run, just make the whole thing much better, much smoother. It's like this whole dev ops idea based on mutual respect, mutual empathy, and being aware of what the other role are doing. And also helping out with that, preparing for that. If you think about, if I, as a developer are aware of what metrics do, then I can perfectly relate to that. I can instrument my code, and I can make it much, much easier to actually improve the operational experience there.
Michael (11:41):
Versus my focus is on handing over a war file, and once that's there I'm done. That's my job. And in certain environments, if you think about a serverless functions, the service environments like Lambda, I mean, there is not much infrastructure there after all. So who's going to be on call? Very likely more on the developer side. You might have dedicated people, but given that there is no infrastructure, who's got a wider picture, right? So maybe over time, this operational knowledge or awareness or whatever is more and more important for developers after all.
Daniel (12:21):
Some great thinking points there, Michael. Diving in a little bit to the underlying platform. You mentioned there, like Lambda. If we sort of bring it up a step to the sort of like modern PaaS, you can even argue that Lambda and stuff, where they are is a component of a modern PaaS What do you think the workflow looks like for a modern pass? Should it be just that simple Heroku style, get push master and it deploys, or do you think there's more steps needed for the variety of workloads we have?
Michael (12:50):
I don't think that there is the idle pass or the one that rules it all. We have seen many, many different shapes and forms successful. OpenShift has its success. Cloud Foundry has its success. Lambda and other, Heroku, et cetera. And all of them brought something to the table.
Michael (13:11):
I personally think that in the context of Kubernetes at least, supporting a really great way to making sure that you have a very clear, explicit, declarative way of what should be there. Having everything recorded, having a review cycle there, and then doing the actual deployment, automated through an agent that listens to a repo.
Michael (13:34):
But in terms of pass, what I've seen as a pattern now, for many different areas is that you very often have an internal team that uses something like Kubernetes, for example, as a basis. And they build, it might be not a full blown pass, but some kind of middleware that kind of shields the developers to a certain extent from raw codes, resources like pods and deployments and services, whatever, but they might introduce things like the application concept and then implications that are internally defined as whatever makes sense in this context.
Michael (14:11):
On the one hand, it might have to do with portability across different equalities on premises and in the cloud, but also to make developers more productive there. So I've seen this pattern, very often it's not called a pass, but I would argue that it's pass-like functionality, middleware that is then written. It's very often a very thin layer, could be a bunch of pass scripts, could be something else, but something that shields the powerful, but very expressive, lower level infrastructure, like when it is away from the end helpers.
Daniel (14:46):
Yeah. I don't think we found that perfect abstraction yet, have we? I am with you. The deployments, the services in Kubernetes totally level up the game. But I think for some developers that just want to write code, it's still a bit too much detail with YAML sometimes. Right?
Michael (15:01):
Yeah, absolutely. Absolutely.
Daniel (15:03):
Tricky one. But I like your pitch about the middleware leg, because I've built my fair share with that. You and I go back from the Mesos phase. We've seen that for quite some years, to be fair. One of the things I think a lot of folks struggle with is the security around this space. And I see you do some interesting talks, like WeWork's team, open policy agent, a bunch of things around this. Could you talk to some of the security challenges around this kind of middleware layer of the pass-like experience?
Michael (15:29):
Right, right, right. So on the one hand you have what usually is lumped up under supply chain management. So essentially the idea that for every artifact in the simplest case, let's say a container running in a container orchestration system, like Kubernetes. You were able to exactly say who produced that artifact, who created that image, who, for example, if we get up set up, who merged that pull request that actually led to this deployment, that runs that container in a pod.
Michael (15:59):
So at any point in time, you have full visibility and can trace back who created what artifact and put it into production, for example. And then also, if you have some abnormality going on there, something, and anomaly like, I don't know, you might be DDoS'd, or you have a hacker in there, or whatever, you can say, and that actually is a great foundation for machine learning, that can say like, well, usually in this cluster, in this name sys, whatever, this is the usual pattern, and now I see something different. Maybe we should alert somebody or whatever.
Michael (16:35):
So automation is definitely a big part of that. And a lot of things that I've been dealing with and seeing is compliancy. And very often people think of financial institutions having to stick to a certain regulations. There are many, many of them, especially in Europe. But that's not the only case. That's a important one, but it's not the only one. You have examples of Metal, for example, doing the Kubernetes API deprecation, using OPA for that, essentially saying what API has been deprecated where. There are so many, if you think about it, that you can take and make explicit. And with that use OPA's Rego to express it. For example, think of a very simple setup in GitOps where you say a deployment has to have certain labels and has to conform to, or there must be a helm chart, there must be certain things present or whatever. And now you can write a bot in looking at the repo, that looks at the pull request, and fed with or equipped with these OPA rules, can actually kind of review this pull request before a human even looks at it and say, well, it does not conform to whatever business or whatever other compliancy regulations there are, and protect it or maybe even automatically emergency it if all rules have been followed.
Michael (18:06):
So think of it that if this and this is missing, or if that and that is the case, either which is currently hard-coded in your code, or which is usually done by a human, can be externalized and represented as OPA rules. And then it's just a matter of where in the process do you make it available or apply these, enforce these rules very early on in the stage, like more on the build side of things, or later on.
Michael (18:36):
And very often things that happen in a running cluster also have to be checked. I mean, it's nice if you can prove that your image has been scanned and signed off by the right person and everything, but there might be something happening while it is running in the cluster. So you still need to keep an eye on that, and you might have a different set of rules for that case as well.
Daniel (19:01):
Well, that's something I've bumped into. With ambassador, we get folks in the OSS Slack asking us, "Do you support OPA?" We've also seen it from the STO perspective in the service mesh layer. I'm sure you might bump into it with App Mesh and so forth. I do like the runtime pitch with OPA. We've kind of got authentication sorted, but authorization, particularly the surface to surface level is not really there yet. Have you bumped into any of the uses of OPA in that space?
Michael (19:24):
So I'm a big fan of RBAC and Kubernetes myself. I run the RBAC.dev advocacy website, it collects RBAC good practices and material and recipes, et cetera. But I also see the limitations of RBAC, and that is that it cannot represent all the necessary things.
Michael (19:44):
For example, I want to be able to say in this group, someone has access to this namespace during the office hours from this region. And you just simply cannot say that in RBAC. Or also after the fact, when something has happened, could be after an incident or whatever, there's definitely interest to apply that. I am not sure to what extent these kind of policies would be done in core Kubernetes. PSP's particularly policies might be an area where Kubernetes is open to potentially use OPA for that. But to me, the point is OPA is a very, very generic framework.
Michael (20:24):
Maybe currently the most popular use case around a gatekeeper, which works very, very closely, tightly integrated with Kubernetes, but there are other things than Kubernetes out there. There are other concerns. What I like most about OPA is that it's a generic framework. It's just like a program language. It doesn't force you to do exactly this one thing. You can express any kind of constraints or policies or whatever with it. And then the question is really just where and how do you enforce it, in which stage, and do you automate something, do you immediately shut down something, or do you alert a human, and the human then has to take some kind of action based on that outcome?
Daniel (21:06):
You mentioned a couple of times there about open standards, open policy agent, for example. How important do you think organizations like the CNCF, the Cloud Native Computing Foundation and the CDF, the Continuous Delivery Foundation? I know Amazon is heavily involved in open source, too. How important do you think these open standards, open frameworks are for driving the community forward?
Michael (21:26):
Absolutely. I think that's super important, and I've noticed that, like way earlier, going back 2004, '05, '06, when I started being part of W3C and ITF, then Apache Software Foundation, and it was always the same idea.
Michael (21:43):
If you do have an open standard, nowadays, these open standards seem to be almost the default. Everyone goes like, oh yeah, sure. That makes sense. But that was not always the case. You would have for a very long time, format, specifications, whatever, that were dominated and owned by a certain entity or a certain company. And I think that these open standards really, really are super helpful because now as a vendor, you can differentiate by having a faster or better or whatever implementation, but there's interoperability there. You do have, if everyone sticks with that standard, a core interoperability, making it easier for people to swap out things. And that's always a good thing. If you can compete on being more secure, feature rich or whatever, but you do have these guarantees to a certain extent to have interoperability and being able to move around stuff.
Daniel (22:39):
Very nice. Very nice. No, I tell you like the interoperability pitch, it makes a lot of sense. Based on your experience, have you got any advice for how folks should pick some of these core components? We've talked about, say building a platform, building a PaaS, building a good developer experience. There's some advantages to picking open things. There's some advantages to going sort of maybe more closed for certain areas. Have you got any advice on to where people should invest their time, and also probably invest their money for certain parts of this experience?
Michael (23:09):
My high level advice is always the same, and that is look at what makes you productive. If you are faster and better off with a, let's say virtual machine, and that is all you need, and that is all you care about, why not, right? You can do many, many of the things on a single virtual machine. You don't need to go into containers. You don't need to go into functions.
Michael (23:34):
Look at what makes you productive, and then what you're coming back again to the initial discussion with businesses, business metrics, what makes you faster and more productive there. And then, I don't think that it's like an all or nothing. It's like, hey, if you want to use containers, then you have to go all in and you have to use service measures, and you have to use X, Y, Z. Everything, right? Pick what makes sense for you. Pick what helps you delivering things faster, making your customers happy, and not necessarily where the hype is, because the hype, every couple of months there's a new hype cycle somewhere, and something peaks.
Michael (24:13):
And if you're chasing that, A, you will always be behind, because by definition you're chasing something, and B, unless it was developed in your organization or under your control, you don't own it. You're a visitor there, you are a customer there, and you can adapt it. You can wonder how you can best use it in your own mind, but you don't control it. You don't necessarily dictate certain features or whatever in that environment. So I would always focus on what makes the team or me, if I'm an individual, most productive, and then gets most out of it.
Daniel (24:49):
Yeah. Sound advice, Michael. That's a voice of experience there. I can definitely tell. So wrapping up, Michael, the final couple of questions. What do you think the developer experience is going to look like over the next five years, say? We're definitely seeing the rise of Lambda and platforms, sort of function as a service. Kubernetes might sort of go into the platform a bit more, I've heard a bunch of folks say. What do you think we'll all be doing in five years time when we're building software?
Michael (25:13):
So there are certain good practices or patterns that, if you look through, is pretty much more or less standard, but it's kind of like expectations. How you should think about playgrounds. Pretty much all languages or environment nowadays, you have a playground. OPA, for example, has a great playground. You can just try out something very, very quickly, get an idea. You're not forced to believe what a vendor says and whatever. You can just try it out yourself. You can very, very quickly form an opinion and get an idea. Is that thing a fit? It doesn't necessarily answer certain questions in terms of scalability and how well does it integrate or whatever, but to give folks a first taste, it's like, if I go there and it's like, oh, that doesn't really feel right. That doesn't feel like the way how I plan to use things. Versus, oh yeah, okay.
Michael (26:10):
It's kind of like a litmus test. It's like, okay. I can imagine working with that. Okay. Now I need to invest more. So this kind of like playgrounds or whatever you want to call them to quickly get a feeling for certain language or environment or whatever it is.
Michael (26:25):
The other thing that I believe we will see more and more is that getting in terms of abstractions higher and higher. So this, what we call in AWS, this undifferentiated heavy lifting, in terms of looking after a single VM or even a single cluster that goes away, and then you are focusing on higher level things, so more on the business logic. And also I think you'll really see that use machine learning building blocks, essentially part of rather than doing it from scratch, just using it. I recently put together a serverless demo called [Nodeless 00:27:04] that essentially uses one part there to do some recognition off of scribbles or words or whatever.
Michael (27:13):
And if I would compare that with, I would do that myself really, rather than using an API, I would need to be able to pick the right algorithm or the right set of algorithms. I would effectively either need to be a data scientist myself or whatever. In this case, I trust using the API.
Michael (27:30):
So I don't really see, not within five years that ML will simply do everything for us. You give it some data and it will pop out some end to end application. But kind of like Lego bricks, we just take parts and use where it makes sense and where it's helping us a lot to build our applications. So this kind of like higher levels of abstractions, not dealing with machines, but focusing on the business logic, plus this pulling in the machine learning APIs where it makes sense. Yeah. And then again gives us results faster and more reliable.
Daniel (28:07):
I like that a lot, Michael. Yeah. Great stuff. Well, it's been awesome talking today. If folks want to follow your work a bit more closely, where's best? The Twitter, LinkedIn? You mentioned a few websites there as well?
Michael (28:16):
Yes. Twitter is definitely usually the most up to date place, LinkedIn as well. Hangout on Slack, fairly often. Kubernetes CNCF, a couple of places. I do have a website which maybe gets updated two or three times a year. It gives you an idea, but not necessarily the most recent one. So yeah, Twitter is probably best.
Daniel (28:39):
Great stuff. Well, thanks for your time today, Michael. I really enjoyed the conversation.
Michael (28:41):
All right. Thanks a lot for having me, Daniel.