Docker source code analysis (seven): Container Docker network (on)

label Docker
173 people read comment(0) Collection report

1 preface (what is Container Docker)

Now, docker technology is popular, everyone try and fun docker, at the same time, certainly can not be separated from a concept, that is "container" or "docker container". So we firstly from the angle of realization, to look at the "container" or "Docker Container" in the end what.

Gradually familiar with docker after, you will deeply feel: applications within the docker container deployment and operation is very convenient, as long as there is Dockerfile, one click application deployment environment is definitely not the Arabian nights; docker container within a running application can be subject to resource control and isolation, greatly satisfy the cloud computing era. Needless to say, these characteristics of Docker, the traditional mode of application is completely do not have the. However, behind these eyes a bright characteristic, in the end is who in "blink", in the end is who can support docker these characteristics? Do not know this time, we will think about the powerful Linux kernel.

In fact, this is a large part of the function needs to be attributed to the Linux kernel. Let's see what is Docker to Linux from the angle of the core, first from the Docker of Container. About Container Docker, the experience of the developer first feel sure there are two points: the internal can run applications (processes), as well as providing an isolated environment. Of course, the latter is certainly one of the reasons the industry called the "container".

Since Container Docker can run the process, then we first look at the relationship between the Container Docker and the process, or the relationship between the container and the process. First of all, I put forward such a question for you to think about whether the container can be separated from the process of existence". In other words, can create a container, and there is no process inside the container.

It can be said that the answer is negative. Since the answer is no, it is not possible to have a container, and then again the process, then the problem has come, "the container and the process is born together, or the first process and then the container?" It can be said that the answer is the latter. The following will slowly explain the reasons.

Explain reason why the question "whether the container can exist out of process" before, I believe everyone to say don't dissent: the docker create a docker container is a container, and the container provides process group isolated operating environment. So the problem is that the container in the end is what way to achieve the process of operation environment of the "isolation"". At this time, it is the turn of the Linux kernel technology grand debut.

When it comes to the "isolation" of the running environment, I believe that we are sure that the kernel characteristics of namespace Linux and CGroup will not be unfamiliar. Namespace is primarily responsible for namespace isolation, while CGroup is primarily responsible for resource usage constraints. In fact, it is precisely the combination of these two magical characteristics of the core, to ensure that the Container Docker isolation". So, namespace and CGroup and the process of what is the relationship? The answer to the question can be explained in the following order:

  1. When a child process is created by fork, the parent process uses the namespace technology to implement the child process and other processes (including the parent process);
  2. After the child process is created, the use of CGroup technology to process the child process, the realization of the process of resource use restrictions;
  3. System in the process of the sub namespace internal, create the need to isolate the environment, such as the isolation of the network stack, etc.;
  4. Namespace and CGroup two technologies are used, the process of the "isolation" environment is really established, then the "container" was born!

Birth from the point of view of the Linux kernel analysis of the container, streamlined processes such as the above four steps, and the four steps also happens to skillfully describes the namespace and CGroup both technology and process, and process and container. The relationship between the process and the container, nature is: the container can not be separated from the process, the first process, after the container. However, it is often said that "the use of Docker to create Container Docker (containers), and then the internal operation process in the container". In this regard, from the perspective of easy to understand, this can be understood, because the word "container", the existence of the word itself is more abstract. If you need a more accurate representation, then can be: "use docker create a process. For this process to create isolated environment. This environment can be said docker container (container), and then the inside of the container is running a user application process." Of course, the author of the original intention is not to deny a lot of people for the Container Docker or container, but hope and readers together to explore the Container Docker underlying technology to achieve the principle of.

For Container Docker or the container has a more specific understanding, I believe we will soon be the eye to locate the namespace and CGroup these two kinds of technology. Linux kernel of these two technologies, can play such a significant role, can not help but praise. So here we have a brief introduction to the two from the perspective of Container Docker implementation process.

First, talk about the usage of namespace in the creation of the container, first from the user to create and start the container. When the user creates and starts the container, Daemon fork will Docker the first process in the container A (for the time being called the process A, which is the Zi Jincheng Daemon Docker). Docker daemon executing fork, in the clone system call will incoming five parameter markers clone CLONE_NEWNS, clone CLONE_NEWUTS, clone CLONE_NEWIPC, clone CLONE_NEWPID and clone CLONE_NEWNET (currently docker 1.2.0 is not fully supports user namespace). The clone system call once introduced these parameters logo, sub process will no longer and the parent process sharing the same namespace, but by the Linux to create a new namespace (namespace), so as to ensure that the child and the parent process using isolated environment. In addition, if the child process fork once again A out of the sub process B and C, while fork did not pass into the corresponding namespace parameter flag, then the child process B and C will share the same command space with A (namespace). If Daemon Docker once again create a Container Docker, the first process in the container for the D, and D and fork out of the sub process E and F, then the three processes will also be in a new namespace. The namespace of the two containers are different from the Daemon namespace Docker. The simple schematic diagram of Docker on namespace is as follows:

Fig. 1.1 schematic diagram of namespace in Docker
Besides CGroup, we all know that you can use CGroup as a process group to do the control of resources. Namespace is different is that the use of CGroup is not in the process of creating the container to complete, but in the process of creating the container after the use of CGroup, so that the container process is in the state of resource control. In other words, the application of CGroup must wait until the first process in the container is created to achieve the. When the container process is created, Daemon Docker can be informed of the process of PID information within the container, and then the PID will be placed in the CGroup file system to specify the location, do the appropriate resource constraints.

It can be said that the Linux kernel namespace and CGroup technology, to achieve the isolation and restriction of resources. So for this isolation and limited environment, whether it is necessary to configure other essential resources. This time the answer is yes, the network stack is at this time for the container to add. When the container process finishes creating isolated operating environment, found the container although already in an isolated network environment (i.e., the new network namespace), but the process is not independent of the network stack can be used, such as independent of the network interface device. At this point, Daemon Docker Container Docker will be required for one one of the resources for its complete. Network, you need to specify the user in the network model, the configuration of the corresponding network resources Container Docker.

Container 2.Docker network analysis content arrangement

Docker container network will from the point of view of the source code, analyze docker container from scratch, create a docker container network context. Container Docker network creation process can be simplified as shown below:

Figure 2.1 Container Docker network to create a flow chart
Container Docker network analysis of the main contents of the following 5 parts:

  1. Container Docker network model;
  2. Client Docker configuration container network;
  3. Daemon Docker creates the container network flow;
  4. Execdriver network execution process;
  5. Libcontainer implementation of kernel mode network configuration.

Container Docker network creation process, the use of the networkdriver module is not the focus, the content of the analysis does not involve networkdriver. Here many readers will certainly have doubts. Needs to be emphasized is that networkdriver in docker: first, docker daemon create network environment, initialize the docker daemon network environment (for details, you can view the docker source analysis "series of the sixth), such as the creation of docker0 bridge etc.; second, assign IP addresses to docker container and docker container do port mapping etc.. While with the Container Docker network to create the relevant content is very small, only in the bridging mode, for the Container Docker network interface equipment assigned a available IP address.

This article for the "Docker source code analysis" series seventh - Container Docker network (on).

Container 3.Docker network model

As mentioned above, Docker can create an isolated network environment for Container Docker, in isolated network environment, Container Docker independent use of private network. Believe that a lot of Docker developers are also experienced Docker network characteristics in this area.

In fact, Docker in addition to Container Docker to create an isolated network environment, the same has the ability to create a shared network environment for Container Docker. In other words, when developers need docker container with the host or other containers of network isolation, docker can meet this demand; and when developers need docker container with the host or other containers sharing network, docker can also meet this demand. In addition, Docker can not create a network environment for Container Docker.

Summary of Container Docker network, you can draw 4 different modes: Bridge bridging mode, host mode, container other mode and none mode. The following is a preliminary introduction to 4 different network models.

3.1 bridge bridging mode

Container bridge Docker bridge mode can be said to be the most commonly used network model Docker developers. Brdige bridge mode for Container Docker to create an independent network stack, to ensure that the process group in the container to use a separate network environment, the realization of the network between the container, container and host host isolation. In addition, docker by host bridge (docker0) to connected container within the network stack and the host's network stack, the realization of network communication of the container with the host and the outside world.

Container bridge Docker bridging mode can refer to the following figure:

Fig. 3.1 schematic diagram of Container Bridge Docker bridge mode
The main steps of the Bridge bridging mode are as follows:

  1. Daemon Docker uses pair Veth technology to create two virtual network interface devices on the host machine, assuming that the veth0 and veth1. The characteristics of pair Veth technology can ensure that no matter which Veth receives the network message, the message will be transmitted to the other party.
  2. Docker Daemon veth0 Docker Daemon docker0 to create additional bridge. To ensure that the host host network messages can be sent to the veth0;
  3. Daemon veth1 will be added to the Container Docker Docker under the namespace, and was renamed eth0. In this way, to ensure the host's network packet if sent to veth0, immediately received by the eth0, realize the host to docker container network connectivity; at the same time, but also to ensure docker container used alone eth0, realize the isolation of the vessel network environment.

Bridge bridging mode, from the principle of Container Docker to the host machine and other machines and the network connectivity. However, due to the IP address of the host's IP address and Veth pair are not in the same segment, so only rely on Veth pair and namespace technology, is not enough to is outside the host network take the initiative to find docker container. In order to service the docker container can let host outside the perception of the world to the interior of the vessel exposed, docker using nat (network address translation, network address conversion), let the world outside the host can take the initiative to network message is sent to the container inside.

Specifically, when the Container Docker requires exposure to the service, the internal service must monitor the container IP and port number port_0, so that the outside world to initiate a request for access to the initiative. Due to the world outside the host, only know the network address of the host eth0, and does not know the docker container's IP address, even if know docker container's IP address, from the perspective of a two layer network, the outside world can not directly through the docker container's IP address access to the application in the interior of the container. Therefore, NAT uses the Docker method, and the port of the service monitor inside the container is bound to a certain port port_1 host ".

As a result, the process of external access to Container Docker internal services is:

  1. External access to the host host IP and host of the port port_1;
  2. When such a request is received by the host, because of the existence of DNAT rules, the request destination IP (IP host eth0) and destination port port 1 conversion, converted to IP of the container and the container port 0;
  3. Because the host knows the container IP, the request can be sent to the pair veth;
  4. Pair veth0 Veth to send the request to the container internal eth0, and ultimately to the internal services to deal with.

Using the DNAT method, you can make the world outside the host Docker host active access to Container Docker internal services. So Container Docker how to access the host outside the host of the world. The following is a brief analysis of the process of Container Docker access to the world outside of the host machine:

  1. Container Docker internal process was informed of the host service outside the IP address and port port_2, so Container Docker request. The independent network environment of the container guarantees that the source IP address of the request is the container IP (that is, the container's internal eth0), and the Linux kernel automatically allocates a source port for the process (assuming port_3);
  2. A request by the other end, the container eth0 sent to Veth pair at the veth0, is also arrived at the bridge (docker0);
  3. Docker0 Bridge opened the datagram forwarding function (/proc/sys/net/ipv4/ip_forward forward, so will request eth0 is sent to the host; and
  4. Host processing request, the use of SNAT on the request for the source address IP conversion, the source address IP (container IP address) is converted to host IP eth0 address;
  5. Host will be after the SNAT conversion of the message through the request of the destination IP address (host computer outside the world's IP address) to send to the outside world.

Here, a lot of people will ask: the docker container interior initiated external network request, when requests arrive the host of SNAT and send it to the outside world, when the outside world in response to the request and response docker of host's IP address, the destination IP address in the message must be the response message back to the host, the host is to be transferred to docker container?? On such a response, due to the port_3 port is not in the host host to do the corresponding DNAT conversion, in principle, will not be sent to the inside of the container. Why say for such a response, will not do DNAT conversion. The reason is very simple, and DNAT conversion is done for inside the container service listens on a specific port. The port is used for monitoring services, and container is internally initiated the request message, the source port number is certainly not occupied port listening service, so internal container initiated request response not through DNAT in the host.

In fact, this part of the content is to be completed by the iptables rules, the specific rules of iptables are as follows:

-I FORWARD -o docker0 iptables -m conntrack --ctstate RELATED ESTABLISHED, -j ACCEPT

The meaning of this rule is that in the host to network data packets docker0 bridge. If the data message and the connection has been established, the unconditional acceptance, and by the Linux kernel will be to send the original connection, a return to the docker inside the container.

Above is a brief introduction of bridge bridging mode in Container Docker. It can be said that the Bridger bridging mode from a functional point of view to achieve two aspects: first, so that the container has an independent, isolated network stack; second, so that the container and the host computer to establish the world through the NAT communication.

However, the bridge bridging mode of Docker Container in use, not for developers arranged everything. The most obvious is that the docker container does not have a public IP, and host eth0 is not in the same segment. The result is that the world outside of the host computer can not communicate directly with the container. Although NAT mode through an intermediate processing to achieve this point, but NAT mode still exists problems and inconveniences, such as: the container needs compete for port on the host and visitors to the service in the interior of the container to use service discovery informed service external port. In addition, the NAT model is implemented in the three layer network, so it will certainly affect the transmission efficiency of the network.

3.2 host model

The host mode in Container Docker is very different from the bridge bridge mode. The biggest difference is that the host model does not create an isolated network environment for the container. And the reason that the host mode, is because the docker's container will host and host share the same network namespace, so the docker container can and the host, host eth0, realize the communication with the outside world. In other words, Container IP Docker address is host IP eth0 address.

Container host Docker network model can refer to the following figure:

Fig. 3.2 schematic diagram of Container host Docker network model
Above the left of the Container Docker, which uses the host network model, while the other two Container Docker still continue to use the brdige bridge model, the two models exist in the host is not contradictory.

Docker container of host network model in the implementation process, because it does not need the additional bridge and virtual network adapter, so not involving docker0 and Veth pair. As mentioned in the introduction of namespace above, the parent process in the creation of the child process, if you do not use the CLONE_NEWNET parameter logo, then the child process will be created with the parent process to share the same network namespace. Docker is the use of this simple principle, in the process to create the process of starting the container, no incoming parameters of clone CLONE_NEWNET logo, realize the docker container and the host of sharing the same network environment, that is to realize the host network mode.

It can be said, Container Docker network model, the host model is a good complement to the bridge bridge model. Host mode using Container Docker, you can directly use the host's IP address to communicate with the outside world, if the host eth0 is a public IP, then the container also has the public IP. At the same time, the port of the service in the container can also use the host's port, without additional NAT conversion. Of course, there is such a convenient, will certainly lose some of the other characteristics, the most obvious is the weakening of the Container Docker network environment, that is, the container is no longer a separate, independent network stack. In addition, using host mode docker container although can let in the interior of the container services and traditional indifference, no transformation, but due to the weakening of network isolation, the container will share the competition in the network stack and host; in addition, inside the container will no longer have all the resources of the port. The reason is that part of the port resources has been services occupying the host itself, and part of the port has been used to bridge network model of the container port mapping.

3.3 container other mode

Container other container Docker network model is a more special network model in Docker. The reason is called the "container other model", because of this model under the Container Docker, will use other containers of the network environment. It is called "special", because the network isolation of the container in this mode will be between the bridge bridge mode and the host mode. Container Docker shared network environment of other containers, there are at least two containers do not exist between the network isolation, and these two containers and host and other containers in addition to the existence of network isolation.

Container other container Docker network model can refer to the following:

Fig. 3.3 schematic diagram of Container other container Docker network model

On the right side of the Container Docker that uses the container other network model, which can be used in the network environment is the left Container brdige Docker bridge model under the network.

Docker container other container network model in the implementation process, does not involve the bridge, the same also do not need to create a virtual network adapter Veth pair. Completion of the container other network model only requires two steps:

  1. Find container other (that is, the container needs to be shared network environment) of the network namespace;
  2. The newly created Container namespace (also need to share the container of other networks), the use of container namespace other.

Container other container Docker network model, can be used to better serve the communication between the container.

In this mode, the Container Docker can be accessed through the namespace to other containers under the localhost, the transmission efficiency is higher. Although a plurality of containers share a network environment, the overall formation of a plurality of containers is still isolated from the host and other containers. In addition, this model also saves a certain amount of network resources. But it should be noted that it does not improve the communication between the container and the host outside of the world.

3.4 none model

Container Docker fourth network model is the none model. As the name suggests, the network environment for the none, that is not Container Docker any network environment. Once Container none uses the Docker network model, then the container can only use the loopback network equipment, there will be no other network resources.

Can say none mode docker container do little network setting, but as the proverb goes "less is more", without the network configuration, as a docker developer to in this foundation to do other infinite number of possible network custom development. This also happens to reflect the opening of the Docker design concept.

4 author introduction

Sun Hongliang,DaoCloudNew team member, software engineer, VLIS Laboratory of Zhejiang University. Graduate school during active in PAAS and docker open source community, in-depth study and enrich the practice of cloud foundry, good at analysis of the underlying platform code, on the platform of the distributed architecture has some experience, has written a lot of depth technology blog. At the end of 2014 to join the DaoCloud partner to join the team, committed to the spread of Docker based container technology, to promote the pace of the application of the Internet in the container.

Welcome to pay attention to the Docker source code analysis of the public number

Welcome to pay attention to the Docker source code analysis of the public number

step on
Guess you're looking for
View comments
* the above user comments only represent their personal views, does not represent the views or position of the CSDN website
    personal data
    • visit85682 times
    • Integral:One thousand three hundred and sixty-nine
    • Grade
    • Rank:18361st name
    • original47
    • Reproduced:0
    • Translation:1
    • Comments:51
    Blog column
    contact information
    Latest comments