A k8s Node is an object that represents a machine (VM, bare metal, etc.) where workloads run. To be able to run workloads, nodes require the following components:

  • kubelet - A daemon that ensures if Pods are running, including their containers.
  • kube-proxy - It maintains necessary rules (e.g. iptables) for allowing cluster-wide communication. This way it enables communication between all Nodes via Services.
  • container runtime - Responsible for managing the execution and lifecycle of containers. It is what enables running containers on Kubernetes.

Node to API communication

Kubernetes makes use of various authentication and authorisation mechanisms (e.g. HTTPS, PKI, tokens, service accounts) for intra-cluster communication. The default kubelet-to-kube-apiserver communication uses PKI for authentication, authorisation, and encryption of HTTP traffic.

Each node must be provisioned with the control plane’s root CA certificate (to verify and encrypt communication) and a client certificate (used by the kubelet for authentication and authorisation) — both signed by the same CA.

This mTLS setup ensures that kubelets can securely communicate with the control plane, even in environments exposed to the internet.

If a Pod needs to communicate with the API server, it does so using a Service Account, which injects the API server’s root certificate and a Bearer Token for authentication. Afterward, it sends requests to the default kubernetes Service (in the default namespace), configured with a virtual IP that is redirected (via kube-proxy) to the HTTPS endpoint on the API server.

Registering a Node

Nodes can register in the kube-apiserver in two ways:

  • automatically
  • manually

For self-registration, the kubelet is started with the following options:

  • --kubeconfig. The kubeconfig contains credentials for authenticating against the API server
  • --register-node=true (default behaviour)
  • --cloud-provider (optional). This is if you run k8s in a cloud environment, which requires the cluster to have a cloud-controller-manager for integration with the underlying provider’s API in order to communicate the Nodes metadata (e.g. name with zone info inside, the VM’s memory, etc.)
  • --register-with-taints. Register the node with the given list of Kubernetes Taints (comma separated <key>=<value>:<effect>).
  • --node-ip. Optional comma-separated list of the IP addresses for the node. You can only specify a single address for each address family (IPv4 or IPv6). If no IP is specified the kubelet will use the node’s default IPv4 address.
  • --node-labels. Useful when using with the NodeRestriction admission plugin (see Kubernetes Admission Control)
  • --node-status-update-frequency. How often kubelet posts its node status to the API server.

Manual Node registration

To manually create Nodes, you need to run the kubelet with the --register-node=false flag. The API will then check the validity of the created object and if all required services are healthy the Node is marked as “Ready”.

Name uniqueness

As with other objects name uniqueness is important. When you need to modify a Node object or upgrade the underlying machine, it is recommended to reregister the object in the API server (either manually or automatically). This is necessary because k8s implicitly assumes that two objects with the same name are the same instance; in the case of a Node this means same CPU, storage, memory, etc. If an object is modified and the name remains unchanged conflicts could arise since the API server doesn’t know how to reconcile the two “identical” objects. Thus, first remove the object and then register it again.

Node controller

The node controller (as any other controller) is part of the kube-controller-manager binary. It is responsible for 3 primary tasks regarding the Node objects:

  1. Assigning CIDRs to the Nodes.
  2. Keeping its internal list of available Nodes up-to-date. The Node controller constantly communicates with the cloud-controller-manager (when running in the cloud) about the list of available machines. When a VM is deleted, the Node controller removes the relevant object.
  3. Monitoring the Nodes health. There are 2 mechanism through which Nodes send heartbeats to the node controller:
    1. The object’s .status field.
    2. The Node’s Lease object (see Kubernetes Leases).

Note

By default the controller checks the health of the nodes (through the heartbeats) every 5 seconds (this can be changed with the --node-monitor-period option on the kube controller manager). If a node becomes unreachable, the controller sets the .status field to “Unknown”, then waits for 5 minutes before requesting the first API-initiated eviction procedure for the Pods on the node.

Note

There is an eviction rate limit which by default is set to 0.1/sec (--node-eviction-rate) — cannot evict pods from more than a single node per 10 seconds. This prevents excessive workload migrations that could overload remaining nodes.

Sources: