Jason___Bourne21:05 2015.07.25 questions
- Give you a file containing 100 million QQ numbers, how to quickly find a QQ number? 50C
Such as the title, I want to find a more efficient way to welcome everyone to discuss. If the answer is violence swept the fastest, then I am from a box on the ear three hundred!
- Caozhy 22:23 2015.07.25
The easiest way is to the dictionary tree. Search efficiency is Log (N)
- Hadues 23:58 2015.07.25
A file containing 100 million QQ numbers, in the face of such a large data processing, but also to meet the needs of the fastest search to QQ, if it is me, I will consider this in a few ways.
First, the hardware configuration
Two, programming language
Four, check before the judge
Five, storage area
Or listen to me slowly come ---------------------------
First, the hardware configuration - the command line to run the fastest computer
Want to quickly find, weapons must be better, then you have to do the following: 1 operating system Give up the graphical user interface with the command line interface of the operating system, do not explain. If between Linux and Windows, seemingly had to choose Linux. 2 using the fastest computer in the world The Milky way two super computer, Cray super computer, quantum computer, biological computer, which is the fastest on the use of which bar. 3 the fastest computer in the world to form a cluster of servers to handle the best cooperation.
Two, programming language - try to close to the underlying programming language
High level language always seems to have no low-level language processing speed, so should try to use close to the underlying programming language.
Three, algorithm - operating system paging principle + time complexity of the algorithm
Topic in the query speed can be fast, then the meaning is not as long as the time complexity of the low point, more than the sacrifice point space does not matter. Always feel that the operating system of paging query principle is very good, on the basis of the use of a time complexity of the algorithm, then the speed should be improved a lot. Four, check before the judge 1 query before judging the number of QQ Before the query, the first to determine the number of QQ, you can narrow the scope of a part of the paging query. 2 bit by bit paging judgment From the top to judge each one to judge, which belongs to the page, which table.
Five, storage area - the construction of high-speed buffer
As far as possible to store a relatively fast access to the data structure, and put the fastest access to the storage area, the number of bits to be satisfied after the page is loaded into memory.
A fantastic dream display slight skill before an expert talk rubbish, unintelligible.
- Sun_xiaofan 20:01 2015.08.18
The problem lies not in the algorithm is the main approach to data storage, you a file storage of 100 million QQ number, this file will have more than a few g size, if your memory is large enough, read into memory although it could be, but generally not so because you read from a file and complex degree is O (n) the magnitude of the.
Because you want to find a QQ number, you have to read a few G data to memory, this overhead is too large, so the problem is not a problem, the data in the disk structure should be how to store.
Generally, such as the database such as self indexing, relatively easy to get, but if you want to maintain the structure of disk storage, it is very complex, here to say a simple and easy to understand.
Use disk directory directly to do a simple index, because the disk itself is a B tree index, so the management efficiency is not low. For example, the top six of the QQ number as an index, all the top six identical QQ numbers, are processed into a file. Then the file name to the six named qq. All files into a folder may be more difficult to manage, then you can use this idea, on the first level directory with the top 5 QQ to do the index, followed by management, and finally the entire directory is a 10 fork.
Efficiency is not high reason is because read all the files it's a waste of time, you will file scattered storage, read the file quickly, the search range is also very small.
The overall complexity = a number of documents to find when the random disk addressing +1M about disk read, the next direct linear search is very fast.
In fact, we found that the number of search, complexity is not in memory, but to avoid a large number of data from the disk read. Database do this thing than file quickly, the reason is, first database cache, before check collection will cache for a period of time. Secondly, on the disk is the primary index structure, than our own file tree simulation to the efficient.
- U011767611 11:25 2015.07.26
Depending on the type of data you are:
If the QQ number is stored as an integer in the database, then you can find the;
If it is a character type, you can double traversal: (algorithm using Python to achieve)
Index = 0 7 current_src =  Eight 9 "" "the dst_qqs itering""" 10 dst_qq for in dst_qqs: 11 print'match -->', dst_qq,'In, index 12 src_qq for in src_qqs: 13 src_qq[index]==dst_qq: if 14 current_src.append (src_qq) 15'match:'src_qq, print 16 else: 17'not match:'print, src_qq Eighteen 19 time.sleep (1) 20 index = 1 21 current_src = src_qqs
Principle: first traversal of the target QQ number, each get one, and the internal traversal of the target QQ source, for each group of QQ corresponding to the corresponding bit, in accordance with the added to the list;
At this point, set the 100 million QQ number, 0~9 corresponds to each of the same probability, then select the 10 million group qq;
Continue to traverse: (set the target QQ number is 9), the external traversal 9 times, the internal traversal sequence is:
100 million (100 million)
One hundred thousand
Time complexity can be self - calculus;
- Gtitanlq 21:47 2015.07.25
Node *next struct;
Node *father struct;
A character a character reading, each character a node, to determine the character of the node is present, there is no dynamic creation of a space in front of the character corresponding to the node flag Assignment 1
FTell get the location, copy to the node in the POS, and then continue to read and write ah
In the end, it is a big tree.
At the end of the cycle when the loop traversal on it
- Tabe123 22:12 2015.07.25
10 fork tree can be considered a good
- Zuishikonghuan 08:19 2015.07.26
Using the underlying programming language
Conditional words Distributed Computing
- Zuishikonghuan 08:23 2015.07.26
If the platform support can also be provided by the platform interface, such as in the windows system can memory mapped files. Of course, it is best to write drivers. In the drive program control disk device, to avoid the in application layer API calls, interrupt in the kernel, system service routines, I / O management, buffer copy, filtration equipment, etc. the influence reading speed
- Qq_27220973 12:29 2015.07.26
It is recommended to use the idea of data mining to do a look, the data preprocessing, sub block processing, the feeling will solve the problem of large data, memory is not enough
- Wingfiring 18:45 2015.08.18
Ask the Lord, how many times do you want to check? If you check it out once, you can start acting from the fans.
Other similar problems
Related reference materials
- Use the procedure to determine whether a QQ number online, whether there is (Delphi call webService)
- VC database to do a good use of the "phone book", to complete the phone book to add, delete, modify, as well as the name, telephone number, QQ number of three search methods.
- Wangyaninglm Recommended: OpenGL beginner, may I have this code is wrong? Why not show up in the window