MasterofProject
return Sign in
Three

TiDB: distributed database solution supporting MySQL protocol

[editor's note] TiDB is a distributed SQL database developed by the PingCAP team in china. Inspired by F1's Google, TiDB supports the features of traditional RDBMS and NoSQL. In the domestic ITOM management platform OneAPM organized by the technical open class, TiDB senior engineer Liu Qi from the HBase characteristics, TiDB advantages and system architecture and other aspects of a detailed exposition. The following is the speech finishing:

HBase profile

It is well known that in the SQL in the top two companies, one is the oracle. They have accumulated a lot of experience, another is Google, Google F1 in 2012 released a paper, personally think that it is the world's most outstanding SQL OLTP database.

Around 1978, the database has just developed when the RDBMS SQL. Around 2000, the country began to popular Internet, the Internet has a greater impact on the Oracle database. Now, most of the traditional database is concentrated in the traditional areas, the Internet is used more MySQL, followed by NoSQL and other HBase also attracted a large number of users.

Why NoSQL? Start all people with Database SQL, then there are more high-end Oracle, open source as well as MySQL, PostgreSQL. However, with the rapid development of the business, the database has become a bottleneck, so prompted the birth of NoSQL, Scale will be the first NoSQL. If the rapid development of the business, the expansion will become a major problem to be solved. At this time, most people will choose to give up the consistency of the transaction. What is consistency? Such as the use of micro channel, if I add you as a friend, this is a two-way relationship, corresponding to the database at least two operations, the first is in your buddy list to get you in. The second is your friend list in my added. If these two lists of databases on different machines, you need to ensure consistency. Otherwise, I will be your friend, but you can not find my friends in this situation. But it may appear in a variety of situations, such as I have you as a friend, and then modify the data when crush out. The traditional scheme is the introduction of a message queue, some still need to do some compensation, these problems in the NoSQL treatment up relatively trouble.

The country's largest HBase users are millet company, there are several Committer HBase, so after some modifications can support distributed transactions, so be able to solve the problem before. Why in the face of many choices, millet will choose HBase it? In terms of the current situation, the main or technical selection and talent reserve considerations. MongoDB we should not unfamiliar, but the use of a certain degree, there will always be a variety of problems, and even the article called for people to give up MongoDB. But all databases are not "perfect", not the best, the most suitable is particularly important.

Many times the product has its own characteristics, in satisfying the properties or specifications, can be very conveniently. Otherwise, nine times out of ten are encountered all kinds of trouble. For example, the use of HBase millet very handy, but other companies are not necessarily. The reason is very simple, if you are not familiar with the use of the scene, do not know what parameters in the appropriate scene with the parameters, so there will be a variety of problems.

Picture description

In fact, HBase has very good characteristics, at present in Millet company can run a million OPS per second, recently pinterest announced their HBase per second can run three million OPS, the orders of magnitude can be far more than a lot of Internet companies. HBase in read write consistency is excellent, good capability of automatic scale, through block cache and Bloom filters can be a very good solution to the query problem, whether in the disk can also determined by Bloom filters.

On the other hand, Oracle put a part of the logic will be placed in the CPU/ hardware, the corresponding HBase will also put a part of the logic to the corresponding RegionServer. For a distributed system, if you need to query a condition, you can directly put this simple adjustment to the corresponding RegionServer on the implementation of. Another example is the summation arithmetic, there are a hundred billion data, or even a thousand million data, distribution in 10 nodes, summation method of the fastest is let all nodes and the operation, the condition under the push to get all the corresponding data and finally collected to 10 data and can be. In fact, you can continue to push down, this is a more complex database optimization techniques, the actual situation will be more complex. This depends on the HBase to achieve Coprocessor.

We should be more familiar with the MVCC, which is a multi version, it has the advantage of multiple read and not block. And then there's a good character, assuming you're using MVCC, Database's data that you can go back to any time before you're not doing compaction. Now cloud services can also be every half an hour to do a snapshot, in fact, if the use of MVCC back to any one second, you can not need a snapshot.

Advantages of TiDB

Here to introduce our products TiDB, Ti is the element of the periodic table of elements. You if you know our team of programmers, we know that they are relatively geek, names or in Greek mythology, choose a name of God, or in Mathematics for a letter of the Greek alphabet, but look at the circle, the pit have been accounted for. So we in the periodic table of chemical elements found a metal as the name of the project, for the database, it must is high and stable, just titanium metal has a strong corrosion resistance, so the choice of the titanium (TI).

Picture description

Because the goal of TiDB is Google F1, so naturally meet the above characteristics. First is to meet the distributed consensus, that for applications, and you didn't care back into how many machines, transactional consistency is must ensure, such as we have mentioned before a concern in B, two mutually add friends or transfer can directly using a SQL done, without having to worry about the middle process. Another feature is compatible with the MySQL protocol, the country has about 70% of the Internet Co are using MySQL, in order to consider the cost of everyone's migration, we will be compatible with the MySQL protocol. At the same time, as a lot of APP has been running on the MySQL, we provide a sufficient test samples. TiDB test has about five million, each time to submit a line of code, there are about 6 machines running parallel to run Test, about five million Test time is about ten minutes. In order to take care of all kinds of engine enthusiasts, we also support the LevelDB, RocksDB, LMDB, BoltDB, etc.. TiDB is mainly used Go language development, the code is simple, easy to understand, and the performance is very high.

system architecture

Picture description

Any program written using MySQL protocol can be directly used TiDB, which is the content of the MySQL agreement, and then down is Layer SQL. Followed by the transaction KV layer, which is the most precise structure of the F1 and Spanner. The lowest structure is from KV, in the KV based on a distributed KV layer to support the transaction, and then let the SQL statement directly mapped to the KV layer.

Picture description

Next, we introduce the TiDB use of the distributed transaction is how to achieve in the HBase, the early version, we refer to the Percolator Google model. The first assumption is that there is a Client, first for its distribution of a Timestamp, in the Google paper is called Oracle Time, used to assign the time stamp. Read and write operations can be done after the distribution, according to the time stamp to read the snapshot. Before the final submission to Prepare, Prepare will detect if the conflict, the final submission will be Commit, if the entire process can be submitted without any conflict.

Picture description

The image above represents an instance of the original account of Bob with $10, while Joe has $5. The previous figure represents the version, the current version is sixth, pointing to the fifth version, for $10, Joe is $2.

Picture description

Assuming Bob to $4 to Joe. The first step, we should first turn out $4 and $10 into $6, due to be deducted $4 and label yourself is the main lock.

Picture description

Joe is currently the seventh version, because he got $4, so the balance turned into a $6, while marking his own point to another main lock Bob.

Picture description

By the eighth version, the main lock will point to the current 7, then you can delete the main lock. If you find that the main lock is deleted when you visit, the main lock conflict does not exist. At the same time, it will delete their own locks, there are some other cleaning process.

Picture description

There is a single point in the whole transaction model, which can assign a time stamp from Oracle Time, which determines the performance of the whole system. Google paper has a corresponding description, you can go to two million per second. Since the beginning of the transaction and the end of the need to take a Timestamp, so the fastest speed of reading and writing their business is one million per second, they have been implemented in the paper. In fact, there is a better way to improve speed, such as HLC and some of the improved algorithm Oracle Time.

Picture description

About Spanner, we focus on the reference object is the Google Spanner and F1. Since Spanner is highly dependent on the clock, so Google has a set of atomic clocks and GPS clock, GPS signal can give the location and time. Why do we need an atomic clock? Because the GPS clock is particularly vulnerable to interference, such as bad weather when the GPS clock can not run, and the application of atomic Zhong Rengran.

Picture description

Pictured above is the Google F1, which individually marked the Google F1 in this thesis, we are interested may wish to read some, the whole TiDB are is implemented in this thesis. Assuming there are one hundred billion data, want to give you a more index now, how to operate in a traditional database? For example, in a distributed environment, you use MySQL to add an index to an index, which is almost difficult to achieve, but also to ensure consistency of index. For more details, please refer to the paper.

Picture description

TiDB is how to migrate from KV to SQL on it? From the basic knowledge, we can know that the traditional RDBMS database is generally a B-Tree. For distributed relational database, standing on a little more to see, such as Google's F1, the underlying database is the KV layer, the KV layer logic operation. If there is a Table User, in the TiDB, assuming that your Table structure is made up of uid, name and email. In TiDB there is a hidden column called RowID, all the operations including the line lock are locked RowID. Assuming RowID is 1, uid is XX, Name is Bob, Email is bob@Email.com, which all belong to meta information. Even if your name Column is very long, but finally in the database storage is the original information. In TiDB, each column has a unique UID.

Assuming ID of Table is 1, ID of uid is 2, ID of name is 3, ID of email is 4, is. In the database is stored as a KV structure, and then to TableID, RowID, ColumnID re encoding, the table directly into the line of 4 KV. At this time if the select, Email is equal to a value, then you can directly take out the corresponding value, the speed is very fast.

Compatible with MySQL

TiDB has good compatibility with MySQL protocol. There are some of the more well-known MySQL application and management tools, such as WordPress, PhpMyAdmin, Workbench MySQL, can be directly based on TiDB operation. And the data can be infinitely extended, is no longer a single database. Secondly, TiDB is also compatible with various ORM, such as XORM, ORM Beego and so on, can support a lot of MySQL applications. Each time the code updates, these Test ORM will automatically run once, so as to ensure compatibility with MySQL, although there are a number of relatively subtle characteristics of the time is not supported. Schema has now been supporting asynchronous change, for the DDL operation, will not block the line of business.

About community

Currently TiDB fully open source in Github above. Is not a substitute for the concept of open source and open, many large companies, the so-called open only to code upload, and quite a number of domestic famous case, you know many projects have given up on maintenance. But we are going to complete with an open mind to do the whole thing, all of the code, all the discussion, Code Review, Bug Tracking, Roadmap is open source, after all, distributed OLTP relational database is a very common and extremely important frontier areas, the future is an important part of the cloud DBaaS however, in the current technology throughout the community, even if the whole world is not a very mature open source solutions, TiDB is currently in the early, from the architecture point of view, we will be SQL and KV layers are separated very thoroughly, and this is our hope that more developers can according to their own needs more convenient we also want to customize, clearly, relying on a company, or a few people's strength is not enough, we will just PingCAP this fire up , will be a good framework for the development of a good transparent and fair rules, to attract more partners and independent developers, together with the TiDB made China's first world's top open source projects, to achieve a win-win situation.

Good projects can be driven by the community, such as HBase, HBase does not belong to any one company, but the community has been promoting its progress. Currently we are on GitHub is 3200 + star, 32 for contributors, off to a good start, thank you very much, I hope everyone can participate.


(the author:OneAPMEngineer / commissioning editor Zhong Hao)

comment