Securing Azure Kubernetes Service

One of the fallacies that people conjure up when they start to use public cloud offerings, is that they get a fully secure solutions out of the box. People seem to think this about everything from email to infrastructure in Azure. Yes, doing nothing at all with what you get from cloud vendors is probably better than if you set up the same on your own and don’t do anything about security, but it is still your responsibility to decide how secure you need to be.

No matter what you run, or where, you need to be conscious of your security posture.

When running Kubernetes clusters in Microsoft Azure, you get a lot of freebies out of the box. The images on the node computers are kept up to date, the control plane is not something you have to deal with, no etcd backups, and of course you can easily upgrade your entire cluster to a new version of Kubernetes. But other than that, you still have Kubernetes running and you have to secure it.

In this post I want to highlight what I think is at least a good starting point for securing AKS, but security is a never ending project. Remember that the more security you want, the more time, effort and money is going into it. If you need any more help, feel free to reach out to me on Twitter or LinkedIn.

Azure Policy

This one is so important, and a great “Azure-like” security implementation. There are two policy initiatives (group of policies) that are built in, and at the time of writing over 40 different policies for Kubernetes. You can easily enable Azure Policy for AKS on cluster creation, as well as afterwards. Read more about how to use this add-on in the Microsoft docs.

These policies work by utilizing Open Policy Agent (OPA), which is used to create policies that a is integrated with the admission controller as well as a tool called the OPA Gatekeeper.

Going back to the Azure policies, you really should go through and see what is available. I usually suggest that one can start by looking at the two initiatives available and decide which is the best fit for the level of security one is aiming for, then look at the other policies. The two initiates are called baseline and restricted standards, which obviously reflects the level of security that they offer.

I have gone a little bit more into details on the baseline standard and how to keep your clusters compliant in this blog post, and another is coming for the restricted one. But, the concept is the same for both just with a couple of extra policies.

Restricted access to the Kubernetes API

By default, the Kubernetes API is open to the Internet if you create an AKS cluster. In production clusters you should not allow everyone to access the API from everywhere and the first step is to restrict what IP ranges that is allowed.

You can follow the guide at docs.microsoft.com to see how you can do this through the az cli, or set the api_server_authorized_ip_ranges attribute if you deploying through Terraform.

Integrate RBAC with Azure AD

This is very important one, and one that I feel should absolutely be the de facto standard when running AKS. Instead of dealing with creating users and roles yourself, offload this to Azure AD and make it part of that governance model.

Integrating with Azure AD is relatively easy, and can be done by following the steps described here. The only thing you really need is a group that can be used to give cluster admin access, and that’s about it. However, creating a cluster that is integrated into Azure AD or updating a cluster to use Azure AD means that you cannot go back.

Handling secrets

When running applications and services in Kubernetes, we need to create, use and manage secrets. The normal way of doing this is keeping it in etcd through the resource with the imaginative name of secret. However these secrets are available for all to see if they have the right permissions, and they are stored in base64 encoded text. Obviously, one should make sure that only the certain people can read these but even then it’s not a really great way of managing secrets.

This have been a discussion for a long time, and one of the solutions that the Kubernetes community has come up with is allowing people to use the standard called Container Storage Interface (CSI) with the Secrets Store Driver. So instead of mounting secrets in pods as a volume from etcd, we instead could mount secrets from any CSI compliant source. For instance, Microsoft is developing the Azure Key Vault provider for CSI. Read more on how to enable this in your cluster on docs.microsoft.com.

This is still in preview and there are other options out there but looking at how much the community is behind this way of handling secrets it’s hard to think that it won’t be the norm going forward. This same setup can be used for HashiCorp Vault, which I talk about in the next section.

Integrate with Azure Monitor, Log Analytics and Container Insight

Azure comes with some pretty powerful monitoring tools, and the integration for AKS is very good. There is a lot of information that you can access through both metrics and logs, and if you enable container insights with your cluster you can even scrape Prometheus metrics without installing Prometheus. Obviously, you can run your own Prometheus if you want to but this gives you the option not to.

Monitoring AKS is a big topic, and for now I just have to urge you to read up on it on docs.microsoft.com. I will write more on this topic, or even an entire series on it, but for now just know that this is something you need to get very familiar with and that there are decision to be made.

Third-party tools

There is a lot of security tools that can be run with Kubernetes, and the list is growing. Some you deploy in the cluster and some interacts through pipelines, or even directly through CLI. It would be beneficial to take a look at what is available in the community, see what people use to make sure that the run environment is secure as well as the delivery toolchain.

Even though the tools that I mention solves a certain problem, and are great at that, there are several tools doing the same and you might find another that fits your needs better. This short list could go on forever so I just picked some that I have been working with lately that I think people should know about.

kube-bench

The Center for Internet Security (CIS) does a lot to help secure IT, and one thing they do is research they collect in these so-called benchmark. These benchmarks comes in the form of a document patterns and recommendation, as well as how you can check that your cluster adhere to these recommendations. Going through all of this can be a long process, so the fine folks at Aqua Security made the tool kube-bench. This tool is not something that you run continuously, but it’s more of a way to check that you are still doing things right, based on the CIS benchmarks.

You can find the tool here, where they also have a quick start guide. But since we’re specifically discussing AKS here, I recommend that you read up on the tool and then check out how to run the tool directly on the worker nodes here.

Kured

The Kubernetes Reboot Daemon (Kured) is a tool created by the fine folks at Weaveworks to help with that problem we have had for ages, reboot after certain patches. We want most of our processes automated, and when running AKS we get automatically security patching of our worker nodes. Sometimes, as we all know, these patches require a reboot and that is something that AKS doesn’t do for us.

Kured works by looking for a certain file that gets created on servers that need reboot, and reboots that server based on how you want that done. You could make sure that certain nodes don’t get rebooted if they have a particular pod on them, that they only reboot at a certain time, and more.

Kured can be found on GitHub, and a detailed walkthrough can be found on docs.microsoft.com.

Falco

This one comes from the good people at Sysdig originally but has been donated to the CNCF as a project. Falco is deals with runtime security, through policies, and is very customizable.

Falco can be set up to alert when certain policies are triggered, giving you a heads up if something is going on inside of your pods. We have our Azure policies that we have defined, but there are always exploits that can circumvent some of our security policies and this is where Falco can save our butts. Highly recommend reading up on it at falco.org.

It’s worth mentioning that Azure Defender for AKS also have runtime security, but right now it seem to only work if you use the AKS-engine to host your own cluster or on Azure Arc enabled clusters. For native AKS support, we’re waiting for them to create a daemonset for it. See GitHub issue #2268 for more information and to keep track of changes.

HashiCorp Vault

Keeping secrets safe is one of the fundamental tasks for anyone in IT. Most cloud platforms have their own tool for this, but I want to give HashiCorp Vault a shout-out here. Not only does it do what the other tools does, but it has some great features for dynamic secrets, a big compatibility list, and API driven design that makes it very accessible at any stage of development and operations.

Just like Azure Key Vault, HahiCorp Vault can integrate into AKS through the Secret Store CSI. You can read about it here.