The road to the container of a venture company
Author introduction: Chapter Yeming, almond doctor CTO. Middle age programmers, focus on all kinds of technology and team management. Starting in this articleApricot doctor technical station
1. technological challenges of the enterprise
Tolstoy said, "the happy family is the same, and the unhappy family has its own misfortunes." Internet start-ups are the same. Most Internet start-ups will meet the following technical challenges.
- How to build a fast, low cost building system and ensure security and stability at the same time?
- How to quickly build and publish applications to meet business needs?
- How to improve the efficiency of team development and ensure the quality of development?
This list is certainly not complete, but the three should be a common problem that the technology team of the venture company will face. Of course, almonds can't be said to solve these problems completely, but some progress has been made. Let's give a brief introduction to how our almonds respond to these challenges and what can the container bring?
This series of articles will be divided into three pieces. The first chapter introduces the history of the development of the apricot technology architecture. The second article introduces the container and apricot containerization scheme. The third chapter concludes with a summary of why we think that an entrepreneur should use a container, and why the container can help us cope with the three challenges.
2. early amygdala
Before 2012, most Internet Co, including start-ups, were deployed directly to buy servers, renting the rack of the IDC machine room. The application is run directly on the physical machine and needs to be extended to buy a new server. IDC often out of a variety of failures, if encountered IDC migration, it is more painful, must carry the machine in the middle of the night, before the light of the line. In a word, the cost of the company, the stability of the service and the efficiency of the work are all very expensive.
But Dr. almond was lucky enough to catch up with the maturity of the public cloud, so it was built on the basis of the public cloud. Apricot's earliest architecture is as follows:
This architecture is very simple, in which the load balance and the database are based on the Tencent cloud. Then the Tencent cloud also provides some basic monitoring, alarm and security services. Then there are two applications, a mobile back end API, an operating platform, and a Play based Scala application.
Many people may be curious about how to choose Scala / Play for development, after all, Scala should not use much at home. On the one hand, on the one hand, the almond doctor inherited the prescription structure. At the beginning, the prescription was developed based on Scala / Play. The team is familiar with this plan. We need to build apricot doctors quickly, and we will naturally choose the most familiar language and framework. And for small and medium-sized applications, the Scala / Play development efficiency is very high. Scala itself is very expressive, and it is a very interesting language. Many good engineers will also be more interested in the new language.
3. application resolution and CI / CD
After more than one year's rapid evolution, the whole application is becoming more and more complex. So we split the application, and as the business expands, there are more and more applications, such as HIS, CRM and so on. So that's what our architecture looks like.
At this time, the biggest problem of Scala is beginning to embody, that is the problem of compiling speed. At that time, our application deployment is primitive. We must login to the server to run a Shell script. It will pull the code and compile, package and run. The whole process will take 5~10 minutes. The nodes of our API application were later increased to 5 and 6, and even if two were released at the same time, it would take 20 minutes to complete the release. If there is a problem after the release, then vomit blood, because the rollback is the same process, and it takes 20 minutes.
One time we did a Apple Watch, and it started at 12 in the middle of the night. We have a very low BUG activity, beginning a Cengceng rub several minutes to send Apple Watch. Our venture company is not much money, every one is white flower silver, heartache. The repair is very simple, but we have to compile it first or publish it slowly, so we stopped the server hard, and then deployed the new code in a few minutes.
This is what we think we have to have an automated release system. In fact, a few years ago, the release was required by operation and maintenance. Research and development was submitted to operation and maintenance, operation and maintenance manual. The natural release can not be very frequent, and it is a great burden for both development and operation and maintenance. But gradually agility and Devops culture became the mainstream, and continuous integration and publishing (CI/CD) became a infrastructure.
Our first edition of CI/CD is very simple, based on Jenkins, compiled, packaged, and then copied to the server by the script. Because only a package can be used to alleviate the problem of slow deployment. But there are still several problems.
- First, there is no application warehouse. Packaging is one-off, and when deployed, the current application directory will be backed up for rollback, so it can only be rolled back to the previous version.
- Secondly, the health examination is simple and can only be detected if the application is started. We have encountered the application start, and the test is no problem, but the service is still a serious problem, basically unavailable.
- Finally, it does not support the grayscale release, and the problem can only be rolled back.
We had a major failure during the period, and it took a long time to recover because of these factors. As a warning for the future, so we have developed a Frigate release system, its architecture is roughly as follows.
Frigate has an application repository, that is, App Repository. The application repository will save the published version of the application, and the version can be specified when it is rolled back.
Watcher set up a more powerful application detection function. In addition to the general HTTP detection, the data can be obtained from the log and monitoring, and further health detection can be carried out according to the abnormal number, error rate and so on.
Frigate supports grouping and phased release. For example, 2 machines are now published, then health checks, or some manual checks in the middle, and then the remainder of the machine.
After looking back, Frigate did not use the container, but it realized many functions of the container choreography. The screenshot published by Frigate is as follows. This is based on the Jenkins Pipeline.
4. micro service
The system is much more complex, the data is not isolated, the logic is repeated, and the next inevitable direction is the micro service. With regard to micro service, we have two articles on public account (Lego decline service transformation (top) and Lego micro service transformation (hereinafter)).
Our service registration and discovery are based on Consul, and the load balance is implemented through Nginx. The following diagram is the process of registration and discovery of the whole service:
A few points are worth mentioning.
First, our micro services are based on HTTP and Json, and there are no binary protocols such as Protobuf, Thrift and so on. In fact, the performance difference between the HTTP and the binary protocol is not so large as many people think, and it is generally 2 or 3 times the gap (without a pro test). For most enterprises, this difference is not a bottleneck, especially the HTTP2 now. If you really need it, you can run a binary protocol on the HTTP2, and you can implement it by adding a layer on the server side and the client.
Secondly, our micro service is not intrusive to the application. We do not use the common Dubbo, SpringCloud framework. On the one hand, our service caller has a Java application and a Scala application, and it will take a bit of effort to access it. On the other hand, we believe that the future direction of the development of the micro service framework is the non invasive, independent micro service infrastructure. In fact, this is consistent with the idea of container layout, and the concept of Service Mesh recently proposed is further extension. We think this is the future of micro service.
Finally, each of our microservices will generate a SDK for the invoke call. SDK integrates fuses, asynchronous, distributed chases (Development) and other functions.
After building the basic framework of microservice, we developed several micro services, such as orders, appointments, infrastructure, such as push and SMS. Of course, in fact, some are not "micro".
But we find that there are still a lot of problems in the whole system.
Based on cloud services, the cost is low and the efficiency is high. But the operation and maintenance is still resource oriented, and the utilization of resources is not high.
It has the ability to continue to integrate and deploy. But the new nodes, new services, and so on, still need a lot of artificial operation and maintenance, and the expansion is not convenient.
The application architecture is improved by the practice of micro service. But the dependence on management and monitoring has not been perfected, and the stability is still not enough.
What is the 5. container?
Above, we simply illustrate the development of the framework before the almond containerization. Now let's talk about the container.
In 2013, Docker came into the world, and by 2015 it has gradually entered the field of vision. Of course, the container is not necessarily Docker, and the container is now standard. But at the same time, the container will certainly think of Docker. So the container we're talking about here is mainly Docker.
What is the container in the end? As the name implies, the container is used to load things. Here it is used to install the application. The characteristics of the container are simply four points.
The container is self contained, and it packages the application and its all dependencies and can run directly. Previous application dependent management has been a big problem. Even RPM, Maven and Ansible can solve some problems, but there is not a standard mechanism for all applications until the container appears.
The container is portable and can run the same way in almost any place. This ensures that the application has the same operating environment in the development environment, the test environment, the production environment, and so on.
The containers are isolated from each other, and the multiple containers running on the same host will not affect each other. That is, an application running in a container is a resource that cannot be accessed to other containers (processes, networks, files, users, etc.) unless configured as shared resources.
- The container is lightweight, which is started at the second level of the container and takes up very little resources.
A lot of things that containers can do, virtual machines can do it, and what's the difference between them? The following map is a screenshot of Docker's official network, which gives a good explanation of the difference between the two.
But the most fundamental difference, in fact, is the last point: light weight. A lot of people may think it's just a simple difference, but it's not. This is because this makes the container a standardized way to publish applications.
Containers just appeared in 5 and 60s in the last century, and it seems to be just a simple difference, and there is no technical content. But the container provides a standardized logistics mode, and the whole sea, land and air transportation and terminal loading and unloading around the container form an efficient logistics system. In the end, world trade has been changed, and globalization has been facilitated.
So the standardized application of the container will eventually affect the entire application architecture of the upper level. Finally, around the container, a complete set of application architecture will be established to bring about revolutionary changes. Now we can see a clue. Kubernetes has basically become a standard. Not long ago, Google also released the Service Mesh tool of Istio, which further abstracted the infrastructure of micro service.
What is the 6. container choreography?
It's not enough to have containers that can be applied. If you manage so many containers manually, it doesn't take the advantage of the container. So we need a container choreography system. Container choreography provides the following functions:
- Application Scheduling: application deployment, seamless upgrade, elastic expansion, self-healing, etc.
- Resource Management: memory, CPU, storage space, network and so on.
- Service Management: namespace, load balance, health inspection and so on.
- And many other functions, such as logging, monitoring, authentication, authorization, and so on.
The three most important systems in the container choreography are Docker Swarm, Kubernetes, and Marathon/Mesos.
Swarm is the official Docker scheme, the advantage is simple, the shortcoming is too simple.
Then it's Google's Kubernetes, also called K8s. The recent year of Kubernetes has dominated the field of container arrangement, and even Docker officials recently announced support for Kubernetes. Its advantage is big factory support, and Google takes it as a strategic layout. You can imagine it as Android of the year, and its community is also very popular.
Technically, Kubernetes is an integrated solution. The design is excellent. It can be regarded as the design paradigm of distributed system. Google has deep experience in this aspect. The drawback is that it is a bit complicated, and there are still many problems before this year, including performance issues, compatibility of large version, deployment complexity, etc. of course, it is basically solved now.
Mesos is available before Docker. It's resource management of distributed system itself. Mesos is flexible, and it can support various systems, including Spark. Marathon is based on Mesos to implement the choreography function.
We considered container in the middle of last year, and our final choice was Marathon/Mesos. One reason is that the previous Jenkins container has been used in Marathon/Mesos, some experience. On the other hand, the scheme is easy to integrate with the current architecture. Kubernetes is too complex and migrated to change the architecture too much.
7. the container of almond
The architecture of our container is this:
All applications run on Mesos Slave in a container way, and Mesos Master manages the Mesos Slave server in a unified way. Marathon dispatch containers through Mesos to publish, upgrade, expand and so on. Calico is a network solution for Docker, which implements the interconnection between a container, a IP and a container. In the upper right-hand part, we basically retained our previous micro service architecture, but just used for service discovery and registration of Consul Agent instead of Registrator.
At the same time, our CI/CD has also been adjusted accordingly.
Jenkins itself is also container - based, which will be compiled and packaged in the container of Mesos Slave. The application will be packaged in the Docker image and uploaded to our mirror warehouse Harbor. When deployed, Jenkins calls the interface of Marathon to deploy, and Marathon applies for resources from Mesos. When deployed, Mesos downloads the corresponding application mirrors from Harbor and runs according to configuration.
With this system, it's easy to create applications and extend applications. Create application by Dockerfile and Jenkins to create the mirror, and then in the Marathon interface, only need to prepare a Json configuration (can also configure Form), specify the resources, the number of instances, mirroring, network, health examination, environment variables, you can quickly launch a new application.
Sometimes we need to prepare a spike activity or push millions of users. We need to increase the application instance, and just adjust a number at the Marathon interface.
In addition to container layout and CI/CD, there are two basic things. One is a unified log platform based on ELK, the other is a monitoring alarm platform based on Open-Falcon, StatsD, Graphite and Grafana. The general structure is as follows, and there will be a special opportunity to write the article in the future.
Finally, the composition of our entire platform is this:
8. container summary
So far, these are the existing infrastructure of the basic platform for almond. Through this system, we improved the resource utilization rate. When we just migrated to the container environment, we only used the original 6 or 70% cloud servers. And we have greatly strengthened our ability to automate operation and maintenance, and improve our service monitoring.
But there are many problems in this system.
- New server nodes still need some handwork.
- The allocation of containers, environment, etc. is scattered everywhere, the lack of effective management.
- Support for stateful application is not good.
- There are some redundancies in the system, such as Zookeeper, Etcd, and Consul.
- Automatic expansion is not supported.
- Some of the infrastructure is not container.
We will continue to evolve in the future, perhaps when the time is ripe, without excluding the container services that will migrate to the public cloud, or to build a Kubernetes cluster.