I have the honor to discuss the problem that the number of single node connections exceeds 1.5W with the leaders of meituan

Interview guide It was included in two columns at the same time
8 articles 3 subscription
3 articles 3 subscription

Hello, I'm Weige, the author of rocketmq technology insider, and won the honorary titles of excellent preacher in rocketmq official community and top 2 of csdn2020 blog. At present, he is a senior architect of the technical platform Department of Zhongtong express, mainly responsible for the R & D and implementation of products such as full link pressure test, message middleware and data synchronization. He has operation and maintenance experience of 100 billion message clusters. He not only has rich practical experience, but also has in-depth and systematic research on its source code. Welcome to pay attention to me and develop together.

I started with a big man in meituan from the connection reuse of netty. Finally, I talked about the high number of connections of a single node in the microservice architecture. I think this is worth discussing and communicating with you.

1. Throw a problem

For example, the micro service architecture of an e-commerce is shown in the figure below:
Insert picture description here
Briefly list the composition of this system:

  • Gateway domain
    The entrance gateway of the whole microservice, cluster deployment.
  • Coupon domain
    It is mainly used to provide coupon services in the e-commerce system.
  • Points service
    Member points service in shopping malls.
  • Payment services
    Provide order payment related services
  • Order field
    The order field mainly provides order placing services. In the order field, you need to call coupons, points, payment and other services, including inventory and other services not shown in the figure.

The above architecture diagram is very simple and clear, but what does it mean if there are more than 5000 deployed instances in each domain?

The deployment of 5000 for each instance may make you feel unimaginable and think that it basically does not exist in reality. However, if it rises to the current front-line Internet companies, it may not be enough. Moreover, in front-line Internet enterprises, it is usually deployed in multiple computer rooms. The problem here emphasizes that there is such a large scale in one computer room.

If the above micro service architecture adopts Alibaba's open source Dubbo framework and orders services for load balancing, the number of connections held by a machine in the order domain can easily exceed 15000. The actual e-commerce business is very complex, and the services to be called are far less than those outlined in the figure above, which will put great pressure on the machine's memory and network scheduling, and in serious cases will affect the stability of the system, Timeout and other exceptions are easy to occur.

2. Problem analysis and Solutions

This paper will limit the use of Dubbo with the micro service framework, but the idea of this paper is not limited to Dubbo.

The above reasons why a single node can easily break 2W connection areLoad balancing mechanismCaused by.

The coupon service is deployed with 5000 nodes. As its consumer, the order service will obtain the whole service list from the registry. Then, when it is necessary to call the coupon service, load balancing will be carried out on the client, from the 5000 service providers included in the service listLoad algorithmSelect a service provider to call, that is, 5000 connections need to be created for the single service.

It is worth noting that the order placing service does not only call the coupon service, but also needs to call other services. As a result, an order placing service will create a very large number of connections with the increase of the number of services it calls.

How to break the game?

First, we need to understand the underlying purpose of load balancing:

  • Avoid single point of failure
    Due to the cluster deployment of service providers, the client can select one from the cluster according to some algorithm when initiating calls. The downtime of one will not affect the use of the client, so as to provideHigh availability assurance
  • Achieve high concurrency
    A single node is limited by memory and connections, and its service capacity is not enough to carry a large number of requests. Therefore, it needs to rely on multiple nodes to provide services together, so as to form a service cluster. It is very common for a single service to deploy 5000 + nodes in large factories.

In general, load balancing is more important than the client,Its basic requirements: fully distribute the traffic evenly to the service nodes and make full use of the processing capacity of the cluster

But you must have all the connections of the service provider to achieve load balancing? I don't think so. Please see the following diagram:
Insert picture description here
Its core idea is to group the client and server

From a single client perspective,It is not necessary to hold a list of all service providersThus, there is no need to create TCP connections to all service providers, just hold some connections and assign the other part to other clients.

However, from the perspective of all clients, all service providers can be called to realize load balancing for all service providers.

Through the above grouping, the number of end connections of a single service node (whether client or server) can be doubled, and the effect will be very significant.

Can we use the existing mechanism in Dubbo?

The answer is: of courseAlthough the purpose of this function design is not the purpose of this paper, it is still similar.

The specific methods are as follows:
Insert picture description here
The client and server can be labeled. In this way, the order Service-1 is labeled C1, although the order Service-1 can obtain the list of all payment services (4 sets) from the registry,However, because routing is performed before load balancing, according to the label routing mechanism: order Service-1 can only access the service provider with tag C1 because its label is C1In this way, only connections to the payment service with tag C1 will be created, so as to reduce the number of connections.

Perfect solution。 At the end of the article, we will pay more attention to the Dubbo routing mechanism. Please refer to the official schematic diagram:
Insert picture description here
For the Dubbo routing mechanism, please refer to another blog post of the author:Gray publishing scheme of Dubbo service governance

One click three links (attention, praise and message) is my greatest encouragement

Build a complete distributed architecture

Insert picture description here

  1. Source code analysis rocketmq column (48 articles +)
  2. Sentinel column on source code analysis (12 articles +)
  3. Source code analysis Dubbo column (28 articles +)
  4. Source code analysis mybatis column
  5. Netty column of source code analysis (29 articles +)
  6. Source code analysis JUC column
  7. Source code analysis elasticjob column
  8. Elasticsearch column (20 articles +)
  9. Source code analysis MYCAT column
  10. Source code analysis canal column
Insert expression
Relevant recommendations More similar content
©️ 2020 CSDN Skin theme: the road to growth Designer: Amelia_ 0503 Return to home page
Paid inelement
Payment with balance
Click retrieve
Code scanning payment
Wallet balance 0

Deduction Description:

1. The balance is the virtual currency of wallet recharge, and the payment amount is deducted according to the ratio of 1:1.
2. The balance cannot be purchased and downloaded directly. You can buy VIP, c-coin package, paid column and courses.

Balance recharge