Join us on Thursday, May 23rd with Buoyant & Ambassador as we dive into the battle of the Service Mesh vs the API Gateway. Register Now.

Back to blog
KUBERNETES

Mastering Kubernetes Pods Troubleshooting: Advanced Strategies and Solutions

Kay James
January 4, 2024 | 6 min read
 Kubernetes Pods Troubleshooting

Kubernetes (K8s) deployments often pose challenges from various angles, including pods, services, ingress, non-responsive clusters, control planes, and high-availability setups. Kubernetes pods are the smallest deployable units in the Kubernetes ecosystem, encapsulating one or more containers that share resources and a network. Pods are designed to run a single instance of an app or process and are created and disposed of as needed. Pods are crucial for scaling, updating, and maintaining apps in a K8s environment.

This article explores the challenges faced with Kubernetes pods and the troubleshooting steps to take. Some of the error messages encountered when running Kubernetes pods include the following:

  • ImagePullBackoff
  • ErrImagePull
  • InvalidImageName
  • CrashLoopBackOff

Sometimes, you do not even encounter the listed errors but still observe that your pods fail. First, it is essential to note that you should understand the API reference when debugging any Kubernetes resources. It explains how the various Kubernetes APIs are defined and how the multiple objects in your pods/ deployments work. The documentation is well-defined under API reference on the Kubernetes website. In this case, when debugging pods, select the pods object from the API reference to get a detailed explanation of how pods work. It defines the fields that go into pods, i.e., version, kind, metadata, spec, and status. Kubernetes also provides a cheat sheet that contains a guide to the commands needed.

Prerequisites

This article assumes the reader has the following:

  • Kind installed for scenario demonstrations
  • Intermediate understanding of Kubernetes architecture
  • Kubectl command line tool

Kubernetes pods error - ImagePullBackoff

The error is shown for three different reasons:

  • Invalid Image
  • Invalid Tag
  • Invalid Permissions

These scenarios arise when you don't have the correct information about your image. You might also not have permission to pull the image from its repository (private repositories). To demonstrate this in the example below, we create an nginx deployment:

➜ ~ kubectl create deploy nginx --image=nginxdeployment.apps/nginx created

Once the pod is running, get the pod name:

➜ ~ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-8f458dc5b-hcrsh 1/1 Running 0 100s

Copy the name of the running pod and get further information about it:

➜ ~ kubectl describe pod nginx-8f458dc5b-hcrsh
Name: nginx-8f458dc5b-hcrsh
hable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m43s default-scheduler Successfully assigned default/nginx-8f458dc5b-hcrsh to k8s-troubleshooting-control-plane
Normal Pulling 2m43s kubelet Pulling image "nginx"
Normal Pulled 100s kubelet Successfully pulled image "nginx" in 1m2.220189835s
Normal Created 100s kubelet Created container nginx
Normal Started 100s kubelet Started container nginx

The image was pulled successfully. Your Kubernetes pod is running without errors.

To demonstrate ImagePullBackoff, edit the deployment YAML file and specify an image that does not exist:

➜ kubectl edit deploy nginx
containers:
-image: nginxdoestexist
imagePullPolicy: Always
name: nginx

The new pod is not successfully deployed

➜ ~ kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-5b847fdb95-mx4pq 0/1 ErrImagePull 0 3m40s
nginx-8f458dc5b-hcrsh 1/1 Running 0 38m

ImagePullBackoff error is shown

➜ ~ kubectl describe pod nginx-6f46cbfbcb-c92bl
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 88s default-scheduler Successfully assigned default/nginx-6f46cbfbcb-c92bl to k8s-troubleshooting-control-plane
Normal Pulling 40s (x3 over 88s) kubelet Pulling image "nginxdoesntexist"
Warning Failed 37s (x3 over 85s) kubelet Failed to pull image "nginxdoesntexist": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/library/nginxdoesntexist:latest": failed to resolve reference "docker.io/library/nginxdoesntexist:latest": pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed
Warning Failed 37s (x3 over 85s) kubelet Error: ErrImagePull
Normal BackOff 11s (x4 over 85s) kubelet Back-off pulling image "nginxdoesntexist"
Warning Failed 11s (x4 over 85s) kubelet Error: ImagePullBackOff

Kubernetes Pods Error - Image pulled but the pod is pending.

Whenever you run K8s in a production environment, the K8s administrators allocate ResourceQuotas for each namespace according to the requirements of the namespaces running within a cluster. Namespaces are used for logical separation within the cluster.

When the specifications in the ResourceQuota do not meet the minimal requirement of the application in a pod, the 'Image pulled, but the pod is still pending' error is thrown. In the example below, create a namespace called payments:

➜ ~ kubectl create ns payments

namespace/payments created

Create a ResourceQuota with relevant specifications

➜ ~ cat resourcequota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
requests.cpu: "1"
requests.memory: 1Gi
limits.cpu: "2"
limits.memory: 4Gi

Assign a resource quota to the namespace payments

➜ ~ kubectl apply -f resourcequota.yaml -n paymentsresourcequota/compute-resources created

Create a new deployment within the namespace with the resource quota restrictions:

➜ ~ kubectl create deploy nginx --image=nginx -n paymentsdeployment.apps/nginx created

Despite the deployment being successfully created, no pods exist:

➜ ~ kubectl get pods -n payments

No resources found in payments namespace.

The deployment is created, but there is no pod in the ready status, none up-to-date, and none available:

➜ ~ kubectl get deploy -n payments
NAME READY UP-TO-DATE AVAILABLE AGE
nginx 0/1 0 0 7m4s

To further debug, describe the nginx deployment. The pods failed to create:

➜ ~ kubectl describe deploy nginx -n payments
Name: nginx
Namespace: payments
CreationTimestamp: Wed, 24 May 2023 21:37:55 +0300
Labels: app=nginx
Annotations: deployment.kubernetes.io/revision: 1
Selector: app=nginx
Replicas: 1 desired | 0 updated | 0 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=nginx
Containers:
nginx:
Image: nginx
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
ReplicaFailure True FailedCreate
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: nginx-8f458dc5b (0/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 10m deployment-controller Scaled up replica set nginx-8f458dc5b to 1

Further analysis from Kubernetes events reveals insufficient memory for the pod to create.

➜ ~ kubectl get events --sort-by=/metadata.creationTimestamp

Kubernetes Pods Error - CrashLoopBackOff

This error occurs when your image is pulled successfully, and your container is created, but your runtime configuration fails. For example, if you have a working Python application that is trying to write to a folder that does not exist or does not have permission to write to that folder. Initially, the application gets executed, then runs into an error. The container is stopped if there is a panic in your application logic. The container will go into a CrashLoopBackOff. Eventually, you observe that the deployment has zero pods, i.e., one pod exists, but it is not running and throws a CrashLoopBackoff error.

Liveness & Readiness Probe Failure

A liveness probe detects if your pod has entered a broken state and can no longer serve traffic. Kubernetes will restart the pod for you. A readiness probe checks if your application is ready to handle the traffic. The readiness probe ensures that your application pulls all the necessary configurations from the configuration map and starts its threads. Only after this process is your application ready to receive traffic. If your application runs into an error during this process, it also goes into CrashLoopBackoff.

Get to Troubleshooting!

This article provides an overview of troubleshooting techniques for Kubernetes pods. It addresses common errors encountered while deploying pods and practical solutions to resolve them. It also provides insight into the reference pages and cheat sheets vital in understanding how Kubernetes works and techniques to identify and resolve issues effectively. By following the guidance presented in this article, readers can enhance their troubleshooting skills and streamline the deployment and management of their Kubernetes pods.

Telepresence

Having Trouble with Kubernetes Pods? Let's Tackle It Together!