Kubernetes is a powerful tool allowing for orchestration of containerized services, applications, and workloads. While Kubernetes has become widely deployed in the last several years, security has lagged behind and best practices are only beginning to coalesce now. According to a 2021 Red Hat survey, 55% of respondents have had to delay deploying a Kubernetes application into production due to security concerns; nearly a third of such security concerns even occurred during runtime.
Even more notable is that 94% of respondents experienced one or more security incidents within their Kubernetes environments–nearly every respondent to the survey had recently experienced a Kubernetes security incident. This is clear evidence that more guidance is needed to help prevent such security vulnerabilities. While not intended to be comprehensive, we present 8 tips and tricks to level up the security of your cluster.
8 Security Best Practices for Kubernetes
1) Use Version Control for Configuration Files Related to Deployment and Services
Using version control allows for the implementation of change approval processes as a means of improving the cluster’s stability and security. This also provides a convenient log of who made changes, catalyzing communication by making it easier to reach out to editors and determine why changes were made. Even in blameless cultures, there are benefits to encouraging these conversations including the catalyzing of knowledge transfer.
Nearly 59% of respondents to a Redhat survey stated that they had detected a misconfiguration on their Kubernetes environments within the last 12 months. Utilizing pre-commit hooks to check for misconfiguration allows for best practices for infrastructure as code to be enforced, preventing misconfigurations or other vulnerability-inducing components such as hard coded secrets from being checked into a repository; these protections are provided by branch protection rules. This helps adhere to the overreaching best practice of shifting security testing left by making security checks a part of a developer’s workflow within PRs.
2) Consider Isolating Kubernetes Code from Feature Code in Dedicated Repositories
In organizations where DevOps and Development are distinct roles, unique repositories for each Kubernetes and feature code can be a key measure to help separate duties for security and compliance purposes. However, the best placement for Kubernetes declaration files and dockerfiles depends on the specific needs of the organization. This discussion is further parsed in a final note below. Helm is a popular package manager for Kubernetes that describes resources in a template referred to as charts. Such package managers make it significantly easier to template and version a complex K8 application, thus working well with source control management systems.
3) Conduct Readiness and Liveness Probes as Part of Regular Check-Ups
Readiness and liveness probes serve as health checks to a Kubernetes cluster. This helps make the service more robust. A readiness probe ensures a pod is initialized before directing a load to it–requests do not reach the service until the probe returns that the pod is up and running. By default, Kubernetes starts sending traffic as soon as the process inside the container begins running. A readiness probe can be utilized to halt traffic until the application is fully initialized, such as during startup. This can be especially useful when defining contingency behavior for pod version upgrades. If a new version of a pod doesn’t successfully load and consequently fails the readiness probe, the old version will continue running thus eliminating the need for a manual rollback.
For the sake of explanation, imagine we have a configuration for our Pod that reads as follows:
// ../pods/probe/sampleProbe.yaml // apiVersion: v1 kind: Pod metadata: labels: test: sampleProbe name: sampleProbe-test spec: containers: - name: sampleContainer image: k8s.gcr.io/busybox livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 // time until first probe periodSeconds: 5 // elapsed time between probes
In this example, we run a probe every 5 seconds, however, the probes do not start for the first 5 seconds. To view events as part of a probe:
kubectl describe pod sampleProbe-test
This will return output indicating the initialization status of the container, as well as provide output as to whether probes have failed.
A readiness probe may sometimes be insufficient for determining if an application is prepared to serve requests. Applications running persistently may break, and sometimes the only course of action is to restart. A liveness probe determines whether the application is running and can be used to remedy applications in a broken state by restarting the container. If the liveness check returns no response, a new pod is launched and the application is then run on this newly instantiated pod.
To recap, the difference between a readiness probe and a liveness probe is that readiness probes are used to determine if the pod is initialized, whereas liveness probes are used to check whether pods are serving requests properly–both types of probe are used as conditional checks. This works especially well in conjunction with logging.
4) Audit logs regularly as part of efforts to identify potential vulnerabilities in the cluster
Your audit.log file has many stories to tell, but only if the cluster has audit logging enabled. To adhere to best security practices for Kubernetes, ensure your cluster has audit logging enabled to allow for these logs to be retained. Checking these logs regularly helps identify potential vulnerabilities or threats within the cluster.
Specific events to log may also be defined, which is particularly helpful for monitoring events or API calls that may indicate compromise, especially authentication failures which may indicate that an attacker is trying to use stolen credentials. Managed Kubernetes providers can provide access to this data; their consoles come with capabilities including notifications for authentication failures. Documentation for Kubernetes auditing may be found within the monitoring, logging, and debugging documentation; this contains guidelines for creating audit policy that may be passed to kube-apiserver.
However, it is imperative to mask data as this sensitive data contained in audit logs as this data may potentially contain passwords, SSH keys, IP addresses, and other information–the risks resulting from data leaks or account compromise is extremely high. Proactively masking data to obscure and anonymize data elements is the best way to prevent personally identifiable information from being exposed within logs. This also helps balance the regulatory requirements involving retaining audit logs and the risk of exposure that grows with retention time. Masking audit logs at the source helps mitigate this risk.
5) Minimize included parts
Including extraneous pieces in a container is bad practice and can lead to vulnerabilities being included in production environments. Components such as debugging tools are useful to attackers and nonessential for containers in production environments–such tools are perfect examples of components to exclude from a container in production.
Minimizing the included pieces includes using minimal base images. Using smaller images provides multiple benefits. Smaller images decrease the time required to build the image, increase pull speed for the image, and reduce chances of security issues as only the required packages and libraries will be included. Alpine images are recommended as they are around 10 times smaller than the base image, and allow for granular control over the included frameworks added to run the application.
6) Utilize namespaces to organize your Kubernetes cluster
One Kubernetes best practice for improving security and unlocking additional capabilities is using namespaces to partition a Kubernetes cluster. Namespaces help keep a Kubernetes cluster secure from other teams working on the same cluster–not providing this separation makes accidental interference and overwrites possible by otherwise benevolent teammates. Separate namespaces should be created for each team, including development, testing, and production teams, as this reduces the chance of unintended crosstalk.
In addition, namespaces allow for resource limits to be defined for a pod. This helps prevent DoS situations caused by unchecked resource scaling. These rules are called resource quotas and can be assigned to namespaces via the command:
kubectl create -f ./<resource_quotas>.yaml --namespace=<namespace_name>
Utilizing namespaces enables more granular cluster control, and is especially powerful in its ability to enable role-based access control.
7) Enable role-based access control
Role-based access control helps eliminate potential nefarious actors from easily accessing points in the Kubernetes cluster. These settings can be applied to namespaces, allowing for strict segmentation between namespaces. Kubernetes provides configurable properties such as role and cluster role; these can be utilized to define security policies. Roles and cluster roles enumerate the actions that may be taken on each resource, but do not state which users have permissions to perform these actions on the objects. This falls within the responsibility of role bindings and cluster role bindings, which link roles to identities such as users, groups, and service accounts. The use of these abstractions is the core of role-based access control.
A key component of using role-based access control to ensure security best practices in Kubernetes is careful oversight. Permissions granted by way of roles and rolebindings should be regularly audited to ensure stale privileges do not remain active–allowing permissions to be under-regulated increases the chances of successful attacks or unintentional damage to a deployment or its data. One may return a boolean value describing whether an identity has permissions to create pods with this command:
kubectl auth can-i create pods -as=<identity>
Auditing permissions may seem labor intensive, but it is much less effort than reconciling a data breach or a compromised cluster resulting from a lack of effective role-based access control.
8) Ensure the production cluster is running on the latest stable version of Kubernetes
Obvious as it may seem, running the latest version of Kubernetes is the simplest way of improving cluster security. New releases contain security patches, additional features, and updates which all aid in reducing vulnerabilities. A reliable source for information regarding Kubernetes security vulnerabilities may be found on CVE details’s database.
A Note on IaC Code
There is room for organization choice regarding placement of Kubernetes declaration files and docker file code. Placing IaC code in a separate repository allows for least privilege policies to be enforced as means of reducing the exposure risk in case of a breach; this allows for separation of duties in a way that meets compliance needs, such as those prescribed by ISO27001. This also allows for code and configurations to be independently rolled back and for easier matching of changes to a running environment. However, placing IaC with feature code allows for better reproduction of production environments for bug fixing–CI/CD tools are usually set up to work with infrastructure-as-code definitions contained in the same repo as feature code. This grouping also allows for easier rollbacks to previous versions, providing benefit from the perspective of version control simplicity.
Ultimately, this choice needs to be made early in the organization’s life and consistency should be enforced to reduce the chances of poor practices being employed in the development life cycle.
Kubernetes provides many options to assist deployment. Adhering to these best practices for Kubernetes improves the uptime of services, begets better reliability, and makes for a more solid foundation. However, it is recommended to also implement security processes into the continuous integration pipeline as well as taking steps to automate the process.
How Cycode Can Help with Kubernetes Security
Cycode provides complete visibility into enterprise DevOps tools and infrastructure. Our Infrastructure as Code capabilities include comprehensive security scanning and preventing escaped secrets, environmental drift, code leaks, and other common issues, all within flexible developer-friendly workflows. Once integrated into developer workflows, each commit and PR is scanned for issues including hard coded secrets or potential misconfiguration and alerts are issued appropriately. Remediation is available both through a GUI and within the PR.
Cycode helps establish strong governance over all points of the IaC lifecycle by providing a cross-SCM inventory of all users, contributors, teams, organizations, and repositories in your organization; this governance extends into providing more oversight into changes made to code as a means of further protecting key code. Cycode also helps you automatically audit access privileges to identify and reduce excessive, unused privileges, and implement separation of duties. Furthermore, Cycode helps ensure that strong authentication and secure development practices are in place. This helps apply security best practices for IaC code when using Terraform, Kubernetes, YAML, ARM, and CloudFormation.
Want to learn more?
A great place to start is with a free assessment of the security of your DevOps pipeline.