Kubernetes Design Concept and Distributed System
Analyzing and understanding Kubernetes’ design philosophy will allow us to better understand the Kubernetes system, make better use of it to manage distributed deployments of cloud native applications, and on the other hand, allow us to learn from its experience in distributed system design.
Hierarchical architecture
Kubernetes design concept and function is actually a Linux-like layered architecture, as shown below
1.Core (Nucleus) layer: Kubernetes’ core function, providing external API for application building and providing plug-in application execution environment
2.Application layer: deployment (stateless applications, stateful applications, batch tasks, cluster applications, etc.) and routing (service discovery, DNS resolution, etc.)
3. Governance layer: system metrics (such as metrics for infrastructure, containers, and networks), automation (such as auto-scaling, dynamic provisioning, etc.) and policy management (RBAC, Quota, PSP, Network Policy, etc.)
4. Interface layer: kubectl command line tool, client SDK, and cluster federation
5. Ecosystem: The ecosystem of large container cluster management scheduling on the interface layer can be divided into two categories:
- Kubernetes external: log, monitoring, configuration management, CI, CD, Workflow, FaaS, OTS applications, ChatOps, etc.
- Kubernetes internal: CRI, CNI, CVI, mirrored warehouse, Cloud Provider, cluster configuration and management, etc.
Kubernetes core technical concepts and API objects
API objects are the administrative operating units in the Kubernetes cluster. Each Kubernetes cluster system supports a new function and introduces a new technology. It will definitely introduce a corresponding API object to support the management operation of the function. For example, the API object corresponding to the replica set Replica Set is RS.
Each API object has three main categories of attributes: metadata, specification , and status .
Metadata is used to identify the API object, each object has at least 3 metadata: namespace, name and uid; in addition to a variety of labels ,labels used to identify and match different objects, such as users You can use the label env to identify different service deployment environments.
for example , you can use env=dev, env=testing, and env=production to identify different services for development, testing, and production.
The specification describes the Desired State that the user expects from a distributed system in a Kubernetes cluster. For example, the user can set the number of pod copies expected by the replication controller Replication Controller to 3; status describes the current status of the system (Status). For example, the current actual number of pod copies of the system is 2; then the current program logic of the copy controller is to automatically start a new pod and strive to reach a copy number of 3.
All the configurations in Kubernetes are set through the API object spec. That is, the user changes the system by configuring the ideal state of the system. This is one of the key design concepts of Kubernetes, all operations are declarative. Not imperative. The benefits of declarative operations in a distributed system are stable, not afraid of losing operations or running multiple times. For example, setting an operation with a copy number of 3 is also a result, and adding 1 to a copy number is not declarative. It is wrong to run multiple results.
Pod
Kubernetes has many technical concepts and corresponds to many API objects. The most important and basic one is Pod.
Pod is the smallest unit that can be deployed to deploy applications or services in a Kubernetes cluster. It can support multiple containers. Pod’s design philosophy is to support multiple containers sharing a network address and file system in a Pod. This can be accomplished through a combination of inter-process communication and file sharing in a simple and efficient way. Pod’s support for multiple containers is K8’s most basic design concept.
For example, if you run an operating system distribution software depot, an Nginx container is used to distribute software, and another container is dedicated to synchronization from the source repository, the mirroring of these two containers is unlikely to be developed by a team, but their work can provide a micro-service; in this case, different teams each develop and build their own container image, which is combined into a micro service to provide external services when deployed.
Pod is the basis of all business types in the Kubernetes cluster and can be seen as a small robot running in a K8 cluster. Different types of business require different types of small robots to perform.
The current business in Kubernetes can be mainly divided into long-running tasks, batch, node-daemon, and stateful applications; corresponding to the control of small robots.
Devices are Deployment, Job, DaemonSet, and PetSet, which will be introduced later in this article.
Replication Controller
The Replication Controller is the original form of replication in Kubernetes. It’s being replaced by Replica Sets, but it’s still in wide use, so it’s worth understanding what it is and how it works.
A Replication Controller is a structure that enables you to easily create multiple pods, then make sure that that number of pods always exists. If a pod does crash, the Replication Controller replaces it.
Replication Controllers also provide other benefits, such as the ability to scale the number of pods, and to update or delete multiple pods with a single command.
You can create a Replication Controller with an imperative command, or declaratively, from a file. For example, create a new file called rc.yaml
and add the following text:
apiVersion: v1
kind: ReplicationController
metadata:
name: soaktestrc
spec:
replicas: 3
selector:
app: soaktestrc
template:
metadata:
name: soaktestrc
labels:
app: soaktestrc
spec:
containers:
- name: soaktestrc
image: nickchase/soaktest
ports:
- containerPort: 80
Most of this structure should look familiar from our discussion of Deployments; we’ve got the name of the actual Replication Controller (soaktestrc
) and we’re designating that we should have 3 replicas, each of which are defined by the template. The selector defines how we know which pods belong to this Replication Controller.
Now tell Kubernetes to create the Replication Controller based on that file:
# kubectl create -f rc.yaml
replicationcontroller "soaktestrc" created
Let’s take a look at what we have using the describe command:
# kubectl describe rc soaktestrc
Name: soaktestrc
Namespace: default
Image(s): nickchase/soaktest
Selector: app=soaktestrc
Labels: app=soaktestrc
Replicas: 3 current / 3 desired
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------------- -------
1m 1m 1 {replication-controller } Normal SuccessfulCreate Created pod: soaktestrc-g5snq
1m 1m 1 {replication-controller } Normal SuccessfulCreate Created pod: soaktestrc-cws05
1m 1m 1 {replication-controller } Normal SuccessfulCreate Created pod: soaktestrc-ro2bl
As you can see, we’ve got the Replication Controller, and there are 3 replicas, of the 3 that we wanted. All 3 of them are currently running. You can also see the individual pods listed underneath, along with their names. If you ask Kubernetes to show you the pods, you can see those same names show up:
# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktestrc-cws05 1/1 Running 0 3m
soaktestrc-g5snq 1/1 Running 0 3m
soaktestrc-ro2bl 1/1 Running 0 3m
Next we’ll look at Replica Sets, but first let’s clean up:
# kubectl delete rc soaktestrc
replicationcontroller "soaktestrc" deleted# kubectl get pods
As you can see, when you delete the Replication Controller, you also delete all of the pods that it created.
Replica Sets
Replica Sets are a sort of hybrid, in that they are in some ways more powerful than Replication Controllers, and in others they are less powerful.
Replica Sets are declared in essentially the same way as Replication Controllers, except that they have more options for the selector. For example, we could create a Replica Set like this:
apiVersion: extensions/v1beta1
kind: ReplicaSet
metadata:
name: soaktestrs
spec:
replicas: 3
selector:
matchLabels:
app: soaktestrs
template:
metadata:
labels:
app: soaktestrs
environment: dev
spec:
containers:
- name: soaktestrs
image: nickchase/soaktest
ports:
- containerPort: 80
In this case, it’s more or less the same as when we were creating the Replication Controller, except we’re using matchLabels
instead of label
. But we could just as easily have said:
...
spec:
replicas: 3
selector:
matchExpressions:
- {key: app, operator: In, values: [soaktestrs, soaktestrs, soaktest]}
- {key: teir, operator: NotIn, values: [production]}
template:
metadata:
...
In this case, we’re looking at two different conditions:
- The app label must be soaktestrc, soaktestrs, or soaktest
- The tier label (if it exists) must not be production
Let’s go ahead and create the Replica Set and get a look at it:
# kubectl create -f replicaset.yaml
replicaset "soaktestrs" created# kubectl describe rs soaktestrs
Name: soaktestrs
Namespace: default
Image(s): nickchase/soaktest
Selector: app in (soaktest,soaktestrs),teir notin (production)
Labels: app=soaktestrs
Replicas: 3 current / 3 desired
Pods Status: 3 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------------- -------
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktestrs-it2hf
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktestrs-kimmm
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktestrs-8i4ra# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktestrs-8i4ra 1/1 Running 0 1m
soaktestrs-it2hf 1/1 Running 0 1m
soaktestrs-kimmm 1/1 Running 0 1m
As you can see, the output is pretty much the same as for a Replication Controller (except for the selector), and for most intents and purposes, they are similar. The major difference is that the rolling-update
command works with Replication Controllers, but won’t work with a Replica Set. This is because Replica Sets are meant to be used as the backend for Deployments.
Let’s clean up before we move on.
# kubectl delete rs soaktestrs
replicaset "soaktestrs" deleted# kubectl get pods
Again, the pods that were created are deleted when we delete the Replica Set.
Deployments
Deployments are intended to replace Replication Controllers. They provide the same replication functions (through Replica Sets) and also the ability to rollout changes and roll them back if necessary.
Let’s create a simple Deployment using the same image we’ve been using. First create a new file, deployment.yaml
, and add the following:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: soaktest
spec:
replicas: 5
template:
metadata:
labels:
app: soaktest
spec:
containers:
- name: soaktest
image: nickchase/soaktest
ports:
- containerPort: 80
Now go ahead and create the Deployment:
# kubectl create -f deployment.yaml
deployment "soaktest" created
Now let’s go ahead and describe the Deployment:
# kubectl describe deployment soaktest
Name: soaktest
Namespace: default
CreationTimestamp: Sun, 05 Mar 2017 16:21:19 +0000
Labels: app=soaktest
Selector: app=soaktest
Replicas: 5 updated | 5 total | 5 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
OldReplicaSets: <none>
NewReplicaSet: soaktest-3914185155 (5/5 replicas created)
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------------- -------
38s 38s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set soaktest-3914185155 to 3
36s 36s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set soaktest-3914185155 to 5
As you can see, rather than listing the individual pods, Kubernetes shows us the Replica Set. Notice that the name of the Replica Set is the Deployment name and a hash value.
A complete discussion of updates is out of scope for this article — we’ll cover it in the future — but couple of interesting things here:
- The StrategyType is RollingUpdate. This value can also be set to Recreate.
- By default we have a
minReadySeconds
value of0
; we can change that value if we want pods to be up and running for a certain amount of time — say, to load resources — before they’re truly considered “ready”. - The
RollingUpdateStrategy
shows that we have a limit of 1maxUnavailable
— meaning that when we’re updating the Deployment, we can have up to 1 missing pod before it’s replaced, and 1maxSurge
, meaning we can have one extra pod as we scale the new pods back up.
As you can see, the Deployment is backed, in this case, by Replica Set soaktest-3914185155
. If we go ahead and look at the list of actual pods…
# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktest-3914185155-7gyja 1/1 Running 0 2m
soaktest-3914185155-lrm20 1/1 Running 0 2m
soaktest-3914185155-o28px 1/1 Running 0 2m
soaktest-3914185155-ojzn8 1/1 Running 0 2m
soaktest-3914185155-r2pt7 1/1 Running 0 2m
… you can see that their names consist of the Replica Set name and an additional identifier.
Passing environment information: identifying a specific pod
Before we look at the different ways that we can affect replicas, let’s set up our deployment so that we can see what pod we’re actually hitting with a particular request. To do that, the image we’ve been using displays the pod name when it outputs:
<?php
$limit = $_GET['limit'];
if (!isset($limit)) $limit = 250;
for ($i; $i < $limit; $i++){
$d = tan(atan(tan(atan(tan(atan(tan(atan(tan(atan(123456789.123456789))))))))));
}
echo "Pod ".$_SERVER['POD_NAME']." has finished!\n";
?>
As you can see, we’re displaying an environment variable, POD_NAME
. Since each container is essentially it’s own server, this will display the name of the pod when we execute the PHP.
Now we just have to pass that information to the pod.
We do that through the use of the Kubernetes Downward API, which lets us pass environment variables into the containers:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: soaktest
spec:
replicas: 3
template:
metadata:
labels:
app: soaktest
spec:
containers:
- name: soaktest
image: nickchase/soaktest
ports:
- containerPort: 80
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
As you can see, we’re passing an environment variable and assigning it a value from the Deployment’s metadata. (You can find more information on metadata here.)
So let’s go ahead and clean up the Deployment we created earlier…
# kubectl delete deployment soaktest
deployment "soaktest" deleted# kubectl get pods
… and recreate it with the new definition:
# kubectl create -f deployment.yaml
deployment "soaktest" created
Next let’s go ahead and expose the pods to outside network requests so we can call the nginx server that is inside the containers:
# kubectl expose deployment soaktest --port=80 --target-port=80 --type=NodePort
service "soaktest" exposed
Now let’s describe the services we just created so we can find out what port the Deployment is listening on:
# kubectl describe services soaktest
Name: soaktest
Namespace: default
Labels: app=soaktest
Selector: app=soaktest
Type: NodePort
IP: 11.1.32.105
Port: <unset> 80/TCP
NodePort: <unset> 30800/TCP
Endpoints: 10.200.18.2:80,10.200.18.3:80,10.200.18.4:80 + 2 more...
Session Affinity: None
No events.
As you can see, the NodePort
is 30800
in this case; in your case it will be different, so make sure to check. That means that each of the servers involved is listening on port 30800
, and requests are being forwarded to port 80
of the containers. That means we can call the PHP script with:
http://[HOST_NAME OR HOST_IP]:[PROVIDED PORT]
In my case, I’ve set the IP for my Kubernetes hosts to hostnames to make my life easier, and the PHP file is the default for nginx, so I can simply call:
# curl http://kube-2:30800
Pod soaktest-3869910569-xnfme has finished!
So as you can see, this time the request was served by pod soaktest-3869910569-xnfme
.
Recovering from crashes: Creating a fixed number of replicas
Now that we know everything is running, let’s take a look at some replication use cases.
The first thing we think of when it comes to replication is recovering from crashes. If there are 5 (or 50, or 500) copies of an application running, and one or more crashes, it’s not a catastrophe. Kubernetes improves the situation further by ensuring that if a pod goes down, it’s replaced.
Let’s see this in action. Start by refreshing our memory about the pods we’ve got running:
# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-qqwqc 1/1 Running 0 11m
soaktest-3869910569-qu8k7 1/1 Running 0 11m
soaktest-3869910569-uzjxu 1/1 Running 0 11m
soaktest-3869910569-x6vmp 1/1 Running 0 11m
soaktest-3869910569-xnfme 1/1 Running 0 11m
If we repeatedly call the Deployment, we can see that we get different pods on a random basis:
# curl http://kube-2:30800
Pod soaktest-3869910569-xnfme has finished!
# curl http://kube-2:30800
Pod soaktest-3869910569-x6vmp has finished!
# curl http://kube-2:30800
Pod soaktest-3869910569-uzjxu has finished!
# curl http://kube-2:30800
Pod soaktest-3869910569-x6vmp has finished!
# curl http://kube-2:30800
Pod soaktest-3869910569-uzjxu has finished!
# curl http://kube-2:30800
Pod soaktest-3869910569-qu8k7 has finished!
To simulate a pod crashing, let’s go ahead and delete one:
# kubectl delete pod soaktest-3869910569-x6vmp
pod "soaktest-3869910569-x6vmp" deleted# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-516kx 1/1 Running 0 18s
soaktest-3869910569-qqwqc 1/1 Running 0 27m
soaktest-3869910569-qu8k7 1/1 Running 0 27m
soaktest-3869910569-uzjxu 1/1 Running 0 27m
soaktest-3869910569-xnfme 1/1 Running 0 27m
As you can see, pod *x6vmp
is gone, and it’s been replaced by *516kx
. (You can easily find the new pod by looking at the AGE column.)
If we once again call the Deployment, we can (eventually) see the new pod:
# curl http://kube-2:30800
Pod soaktest-3869910569-516kx has finished!
Now let’s look at changing the number of pods.
Scaling up or down: Manually changing the number of replicas
One common task is to scale up a Deployment in response to additional load. Kubernetes has autoscaling, but we’ll talk about that in another article. For now, let’s look at how to do this task manually.
The most straightforward way is to simply use the scale command:
# kubectl scale --replicas=7 deployment/soaktest
deployment "soaktest" scaled# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-2w8i6 1/1 Running 0 6s
soaktest-3869910569-516kx 1/1 Running 0 11m
soaktest-3869910569-qqwqc 1/1 Running 0 39m
soaktest-3869910569-qu8k7 1/1 Running 0 39m
soaktest-3869910569-uzjxu 1/1 Running 0 39m
soaktest-3869910569-xnfme 1/1 Running 0 39m
soaktest-3869910569-z4rx9 1/1 Running 0 6s
In this case, we specify a new number of replicas, and Kubernetes adds enough to bring it to the desired level, as you can see.
One thing to keep in mind is that Kubernetes isn’t going to scale the Deployment down to be below the level at which you first started it up. For example, if we try to scale back down to 4…
# kubectl scale --replicas=4 -f deployment.yaml
deployment "soaktest" scaled# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-l5wx8 1/1 Running 0 11s
soaktest-3869910569-qqwqc 1/1 Running 0 40m
soaktest-3869910569-qu8k7 1/1 Running 0 40m
soaktest-3869910569-uzjxu 1/1 Running 0 40m
soaktest-3869910569-xnfme 1/1 Running 0 40m
… Kubernetes only brings us back down to 5, because that’s what was specified by the original deployment.
Deploying a new version: Replacing replicas by changing their label
Another way you can use deployments is to make use of the selector. In other words, if a Deployment controls all the pods with a tier
value of dev
, changing a pod’s teir
label to prod
will remove it from the Deployment’s sphere of influence.
This mechanism enables you to selectively replace individual pods. For example, you might move pods from a dev environment to a production environment, or you might do a manual rolling update, updating the image, then removing some fraction of pods from the Deployment; when they’re replaced, it will be with the new image. If you’re happy with the changes, you can then replace the rest of the pods.
Let’s see this in action. As you recall, this is our Deployment:
# kubectl describe deployment soaktest
Name: soaktest
Namespace: default
CreationTimestamp: Sun, 05 Mar 2017 19:31:04 +0000
Labels: app=soaktest
Selector: app=soaktest
Replicas: 3 updated | 3 total | 3 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 1 max surge
OldReplicaSets: <none>
NewReplicaSet: soaktest-3869910569 (3/3 replicas created)
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
50s 50s 1 {deployment-controller } Normal ScalingReplicaSet Scaled up replica set soaktest-3869910569 to 3
And these are our pods:
# kubectl describe replicaset soaktest-3869910569
Name: soaktest-3869910569
Namespace: default
Image(s): nickchase/soaktest
Selector: app=soaktest,pod-template-hash=3869910569
Labels: app=soaktest
pod-template-hash=3869910569
Replicas: 5 current / 5 desired
Pods Status: 5 Running / 0 Waiting / 0 Succeeded / 0 Failed
No volumes.
Events:
FirstSeen LastSeen Count From SubobjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
2m 2m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktest-3869910569-0577c
2m 2m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktest-3869910569-wje85
2m 2m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktest-3869910569-xuhwl
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktest-3869910569-8cbo2
1m 1m 1 {replicaset-controller } Normal SuccessfulCreate Created pod: soaktest-3869910569-pwlm4
We can also get a list of pods by label:
# kubectl get pods -l app=soaktest
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-0577c 1/1 Running 0 7m
soaktest-3869910569-8cbo2 1/1 Running 0 6m
soaktest-3869910569-pwlm4 1/1 Running 0 6m
soaktest-3869910569-wje85 1/1 Running 0 7m
soaktest-3869910569-xuhwl 1/1 Running 0 7m
So those are our original soaktest pods; what if we wanted to add a new label? We can do that on the command line:
# kubectl label pods soaktest-3869910569-xuhwl experimental=true
pod "soaktest-3869910569-xuhwl" labeled# kubectl get pods -l experimental=true
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-xuhwl 1/1 Running 0 14m
So now we have one experimental pod. But since the experimental
label has nothing to do with the selector for the Deployment, it doesn’t affect anything.
So what if we change the value of the app
label, which the Deployment islooking at?
# kubectl label pods soaktest-3869910569-wje85 app=notsoaktest --overwrite
pod "soaktest-3869910569-wje85" labeled
In this case, we need to use the overwrite flag because the app label already exists. Now let’s look at the existing pods.
# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-0577c 1/1 Running 0 17m
soaktest-3869910569-4cedq 1/1 Running 0 4s
soaktest-3869910569-8cbo2 1/1 Running 0 16m
soaktest-3869910569-pwlm4 1/1 Running 0 16m
soaktest-3869910569-wje85 1/1 Running 0 17m
soaktest-3869910569-xuhwl 1/1 Running 0 17m
As you can see, we now have six pods instead of five, with a new pod having been created to replace *wje85
, which was removed from the deployment. We can see the changes by requesting pods by label:
# kubectl get pods -l app=soaktest
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-0577c 1/1 Running 0 17m
soaktest-3869910569-4cedq 1/1 Running 0 20s
soaktest-3869910569-8cbo2 1/1 Running 0 16m
soaktest-3869910569-pwlm4 1/1 Running 0 16m
soaktest-3869910569-xuhwl 1/1 Running 0 17m
Now, there is one wrinkle that you have to take into account; because we’ve removed this pod from the Deployment, the Deployment no longer manages it. So if we were to delete the Deployment…
# kubectl delete deployment soaktest
deployment "soaktest" deleted
The pod remains:
# kubectl get pods
NAME READY STATUS RESTARTS AGE
soaktest-3869910569-wje85 1/1 Running 0 19m
You can also easily replace all of the pods in a Deployment using the –all flag, as in:
# kubectl label pods --all app=notsoaktesteither --overwrite
But remember that you’ll have to delete them all manually!
Service
RC, RS, and Deployment just guarantee the number of microservice Pods that support the service, but they do not solve the problem of how to access these services. A Pod is just an instance of a running service and may be stopped on one node at any time. Another node starts a new pod with a new IP, so it cannot provide services with a defined IP and port number. To provide services steadily requires service discovery and load balancing capabilities. The service discovery job is to find the corresponding back-end service instance for the client access service. In a K8 cluster, the service that clients need to access is the Service object. Each Service corresponds to a virtual IP that is valid within the cluster. A cluster accesses a service through a virtual IP address. Load balancing of microservices in Kubernetes clusters is implemented by Kube-proxy. Kube-proxy is a load balancer inside the Kubernetes cluster. It is a distributed proxy server.
There is one on each node of Kubernetes. This design reflects its scalability advantages. The more nodes that need to access the service, the more Kube-proxy provides load balancing capability. High availability nodes also increase. In contrast, we usually do a load balancing on the server side to do a load balancing, but also to further solve the problem of load balancing and high availability of the reverse proxy.
Job
Job is an API object that Kubernetes uses to control batch tasks. The main difference between the batch processing business and the long-term servo business is that the batch processing business runs from beginning to end, and the long-term servo business runs forever without the user stopping. The Job-manages Pod automatically exits when the task is successfully completed according to the user’s settings.
The successful completion of the logo varies according to different spec.completions strategies: single Pod tasks have a Pod success to mark completion; fixed-number success tasks guarantee that N tasks are all successful; work queue tasks are flagged according to the overall success of application validation. success.
DaemonSet
The core of long-term servo-type and batch-type services is in business applications. Some nodes may run multiple Pods of the same type of service, and some nodes do not have such Pods. The core focus of back-end support services is in the nodes in the Kubernetes cluster. (Physical or virtual machines), to ensure that there is one such Pod running on each node. The nodes may be all cluster nodes or some specific node selected by nodeSelector. Typical back-end support services include storage, logging, and monitoring services that support Kubernetes cluster operations on each node.
Stateful Service Set (PetSet)
Kubernetes released the Alpha version of PetSet in version 1.3. In the Cloud’s native application system, there are the following two groups of synonyms; the first group is stateless, cattle, nameless, and disposable; the second group is stateful, Pets, having name, non-disposable. RC and RS mainly control the provision of stateless services. The names of Pods controlled by them are randomly set. Once a Pod fails, it is discarded. A new Pod is restarted in another place. The names are changed and the name is changed. It doesn’t matter where the start is. The only important thing is the total number of Pods. The PetSet is used to control stateful services. The name of each Pod in the PetSet is determined in advance and cannot be changed. The role of Pod’s name in the PetSet is not the human reason for Spirited Away, but the state of the association with the Pod.
For Pod in RC and RS, generally do not mount storage or mount shared storage, save the status of all Pod sharing, and Pod does not mean the same as livestock (this also seems to mean that human characteristics are lost); for PetSet Pod, each Pod mounts its own independent storage. If a Pod fails, a Pod of the same name is launched from another node, and the storage of the original Pod is to be continued to serve in its state.
Services that are suitable for PetSet include database services MySQL and PostgreSQL, clustered management services such as ZooKeeper, etcd, and stateful services.
Another typical application scenario of PetSet is as a mechanism for simulating a virtual machine that is more stable and reliable than an ordinary container. The traditional virtual machine is a kind of stateful pet. The operation and maintenance personnel need to maintain it constantly. When the container is just beginning to be popular, we use the container to simulate the use of the virtual machine. All the states are stored in the container. This has been proved. It is very unsafe and unreliable. With PetSet, Pod can still provide high availability by drifting to different nodes, and storage can also provide high reliability through external storage. PetSet only associates determined pods with determined storage to ensure continuity of status. PetSet is only in the Alpha phase, how the design behind the evolution, we must continue to observe.
Cluster Federation
Kubernetes released a beta version of the Federation function in version 1.3. In a cloud computing environment, the range of service distances from near to far can generally be: Host (Host, Node), Cross Host (Available Zone), Cross Availability Zone (Region), Cross Region Service Cloud Service Provider and Cloud Platform.
Kubernetes’s design position is that a single cluster is in the same geographical area, because the network performance of the same area can meet Kubernetes’ scheduling and computing storage connection requirements. The federated cluster service is designed to provide cross-Region Kubernetes cluster services across service providers.
Each Kubernetes Federation has its own distributed storage, API Server and Controller Manager. The user can register the Kubernetes Cluster member of the Federation through the API server of the Federation. When a user creates and changes an API object through the API server of the Federation, the Federation API Server creates a corresponding API object in all of its registered sub-Kubernetes Clusters. When providing the service request service, Kubernetes Federation will first load balance between its own sub-Clusters, and the service request sent to a specific Kubernetes Cluster will follow the same scheduling mode as when the Kubernetes Cluster provides services independently. Do load balancing within Kubernetes Cluster. The load balancing between clusters is achieved through load balancing of domain name services.
All designs try not to affect the existing working mechanism of Kubernetes Cluster, so for each sub-Kubernetes cluster, there is no need for a Kubernetes Federation, which means that all existing Kubernetes code and mechanisms do not need Because of any changes in the Federation function.
Storage Volume
The storage volume in the Kubernetes cluster is somewhat similar to Docker’s storage volume, except that Docker’s storage volume scope is a container, and the lifecycle and scope of the Kubernetes storage volume is a Pod. The storage volumes declared in each Pod are shared by all containers in the Pod. Kubernetes supports a large number of storage volume types, in particular, supports a variety of public cloud platform storage, including AWS, Google and Azure cloud; supports a variety of distributed storage including GlusterFS and Ceph; also supports the easier to use host local directory hostPath And NFS. Kubernetes also supports the use of Persistent Volume Claim, a logical storage such as PVC, which allows storage users to ignore the actual storage technologies behind the backend (such as AWS, Google or GlusterFS and Ceph) and configure the storage technology. Delivered to the storage administrator through the Persistent Volume configuration.
Persistent Volume (PV) and Persistent Volume Claim (PVC)
PV and PVC enable the Kubernetes cluster to have the logical abstraction capability of the storage, so that the configuration of the actual background storage technology can be ignored in the logic of configuring the Pod, and the configuration of the configuration is given to the PV configurator, that is, the cluster administrator. . The relationship between stored PV and PVC is very similar to the calculated relationship between Node and Pod; PV and Node are resource providers, which vary according to the infrastructure of the cluster and are configured by Kubernetes cluster administrator; And Pod is a user of resources, which changes according to the needs of business services, and is configured by the user of the Kubernetes cluster as the administrator of the service.
Node
The computing power in the Kubernetes cluster is provided by Node, which was originally called the service node Minion and later renamed Node. The Node in the Kubernetes cluster is equivalent to the Slave node in the Mesos cluster. It is the working host where all Pods run. It can be either a physical machine or a virtual machine. Whether it is a physical machine or a virtual machine, the unified feature of the work host is to run the container running on the kubelet management node.
Key object (Secret)
Secret is an object used to store and pass sensitive information such as passwords, keys, and authentication credentials. The advantage of using Secret is that you can avoid writing sensitive text in a configuration file. Configuring and using services in a Kubernetes cluster inevitably requires various sensitive information to implement login, authentication, and other functions, such as accessing AWS stored username and password.
In order to avoid writing similar sensitive information in all the configuration files that need to be used, you can store this information in a Secret object, and the sensitive information is referenced in the configuration file through the Secret object. The benefits of this approach include: clear intentions, avoiding duplication, and reducing the chance of a leak.
User Account and Service Account
As the name implies, the user account provides the account ID for the person, and the service account provides the account ID for the computer process and the Pod running in the Kubernetes cluster. One difference between a user account and a service account is the scope of action; the user account corresponds to the identity of the person, the identity of the person is independent of the namespace of the service, so the user account is cross-namespace; and the service account corresponds to the identity of a running program. , related to a specific namespace.
Namespace
The namespace provides virtual isolation for the Kubernetes cluster. The Kubernetes cluster initially has two namespaces, the default namespace default and the system namespace kube-system. In addition, administrators can create new namespaces to meet their needs.
RBAC access authorization
Kubernetes released the alpha version of the Role-based Access Control (RBAC) licensing model in version 1.3. Compared with Attribute-based Access Control (ABAC), RBAC mainly introduces the abstract concepts of Role and RoleBinding. In ABAC, the access policy in the Kubernetes cluster can only be directly associated with the user; in RBAC, the access policy can be associated with a role, and the specific user is associated with one or more roles. Obviously, RBAC, like other new features, introduces new API objects each time a new feature is introduced, introducing a new concept abstraction that will make cluster service management and use easier to extend and reuse.
to sum up
From Kubernetes’ system architecture, technical concepts and design concepts, we can see the two core design concepts of the Kubernetes system: one is fault tolerance and the other is scalability . Fault tolerance is actually the basis for ensuring the stability and security of the Kubernetes system. The scalability is the basis for ensuring that Kubernetes are friendly to changes and can quickly iterate and add new features.
According to the concept of the distributed system consistency algorithm Paxos’s inventor and computer scientist Leslie Lamport , a distributed system has two types of features: security and active Liveness. The security guarantees the stability of the system, ensures that the system will not crash, does not cause business errors, does not do bad things, is strictly constrained; the activity enables the system to provide functions, improve performance, increase ease of use, and allow the system to be “in the user” In the time I saw, “doing something good is doing your best.” The design concept of the Kubernetes system coincides with the concept of Lamport’s security and active liveness . It is precisely because Kubernetes introduces the functions and technologies very well to divide the security and activity, so that Kubernetes can be so fast. Version iterations, quickly introducing new features like RBAC, Federation, and PetSet.