## Distributed computing /Hadoop

Kissmelove0111:57 02-26

### Ctrip interview a MapReduce topic.

Existing 1 million hotel coordinates and 2 billion landmark, which record the latitude and longitude of the landmark, please design the MapReduce calculation of all hotels within 1 km range of the landmark.

Tntzbzc12:43 02-26
Grade The 1 floor

[b] good title!!!!!!!!! [/b] Go home at night to do ~~~~~~[img=http://prog3.com/sbdm/forum/PointForum/ui/scripts/csdn/Plugin/001/face/13.gif][/img]

Kissmelove0113:26 02-26
Grade The 2 floor

At that time the interviewer is to say that the hotel coordinates and landmark coordinates are in a document, assuming that the field is Type (hotel / landmark) ID (Hotel id/ landmark ID) latitude longitude Here to solve the problem is how to design map in the same document with a hotel within 1 km of the landmark to find out, to the hotel ID for the key, the landmark ID for the value to reduce to deal with.

Tntzbzc16:17 02-26
Grade The 3 floor

[b] version of the discussion on the algorithm is really less, after all the algorithms to add 100 points [/b]

Tntzbzc11:27 02-27
Grade The 4 floor

I give up the latitude longitude, directly using a European two-dimensional plan to solve this problem Change the coordinates into X and Y, the smallest unit of rice, there is no decimal point There are two ways, but they all need 2 MR to solve 1, violence algorithm: The first MR: in the MAP phase of the poor to cite all the coordinates of the 1 million hotels (unit meters), so the hotel data will be redundant 1000000* (999*4+1000+2) = 4998000000 = 4 billion 900 million hotel data Format is Key = Y, Value X = hotel / landmark ID, type (hotel / landmark) Put the aggregated Count>1 KEY data into the HDFS - > tmpfile The output format is the number of hotel ID Second MR: read the tmpfile according to the hotel ID do a polymerization addition to get the final results This algorithm is the biggest defect hotel is exhaustive too much, causing MAP to Reduce output data is too large So I think second ways, on the basis of the first algorithm to achieve dimension reduction 1, dimensionality reduction algorithm: First MR: 1, according to the coordinates of each hotel, find 4 new coordinates (X/Y radius according to the method, see blue line +) The blue line rotation 45 degrees to get a new 4 coordinates, the red line The hotel and the landmark of the X/Y divided by the radius (1000) to get Xa/Ya, falling 8 points connected to the gray line within the hotel coordinates Xa/Ya is that we have to statistics out of the data [img=http://prog3.com/sbdm/img.bbs/upload/201402/27/1393471636_621693.jpg][/img] [b] but there is a problem, by 8 points formed by the 8 edge type does not cover the entire circle of all the landmark data [/b] The solution is to hotel coverage as a square (not round), edge width = *2. Cut the square into 4 small squares, each square, representing the area that a hotel can cover. This will appear 4*4=16 coordinates, remove the repeated coordinates of the final 9, so that you can guarantee to cover all of the hotel can be covered by the landmark. Which also includes a number of landmarks outside the radius, where the first to handle him, so reduce to deal with. [img=http://prog3.com/sbdm/img.bbs/upload/201402/27/1393475240_984482.jpg][/img] So each hotel only need 8+1 (don't forget the redundant data center), this Map Hotel output = 1000000*9 = 9 million Format is Key = Ya, Value = hotel / landmark ID, type (hotel / landmark), X, Y, Xa Reduce and before the same but need to do more than one step is to judge whether the distance between the hotel and the landmark in the radius Put the aggregated Count>1 KEY data into the HDFS - > tmpfile Second MR: the same as the first method, read the tmpfile according to the hotel ID do a polymerization addition to get the final results Code, and so I have to write out of space, LZ if you can try to write a free

Sanguomi21:39 02-27
Grade The 5 floor

Can use on the sampling algorithm, and then block Otherwise the efficiency of reduce is too low

Tntzbzc22:28 02-27
Grade The 6 floor

[quote= quoted 5 floor sanguomi reply: " Can use on the sampling algorithm, and then block Otherwise the efficiency of reduce is too low [/quote] can give details about your ideas

Sanguomi09:56 02-28
Grade The 7 floor

My initial idea is that if the 2 billion data and 2 million data, if not done in the region, The least complexity is O (2000000 * 2000000000). And reduce can only have one, which is certainly slow to deal with If it's what I do, 2 billion data, The first step MR sampling calculation according to the coordinates of X, or y to 100 or 1000 Get a relatively average of 100 x or Y coordinate values Second step MR take these points of the coordinates, the 2 billion coordinates and 200W coordinates are also divided into the corresponding 100 or 1000 Here we have to consider a boundary issue, taking into account the limits of the value of some hotels in the coordinates of the point of our point X/Y value So in the coordinates of the partitions of the data, the first block of the actual area is divided to x plus one kilometer, likewise the second quickest region cut down a km (calculated in the map, the two regions to the transmission of a data), hotel zone directly according to the sampling point to can, output key for value corresponding to the sampling point X or Y, data coordinate The third step Mr marker for two file Hotel tag 0 coordinate data of the tag is a combination of sampling points as the key, then reduce end for aggregation operations, coordinates hotel first, stored in containers, later the coordinates of each data are calculated and containers to preserve the hotel coordinate distance calculation, meet the output. If it is 1000 parts, then the basic is O (2000000 * 2000 * 1000) Fourth step MR merger It's not the point to say that it's still wet.

Java896410:31 02-28
Grade The 8 floor

Is an interested question. This Here is my thoughts. 1) only need one MR job. We 2) the focus is how to design a custom key class, which will sort the data as the way we want. Remember, the MapReduce job, almost every key will be compared with each other to grouping/partition and sorted. this is the key feature provided from Hadoop, and we should utilize. Input of mapper will be (LongWritable, Text), default TextInputFormat as, the output of the mapper will be <KeyClass NullWriteable (). The but (). Most important is to The design this custom KeyClass. Are some important information Here related this key class I, will use pseudo code: Key Class { Type enum Double altitude Double latitude } This mapper In we have to have the custom SortComparator GroupComparator and PartitionComparator. Here are the trick parts of each:. 1) sort comparator: For Same type compare, altitude and latitude. For Different type For make sure all is less than s that will sort first then. 2) grouping and Partition comparator this, is very important part. For A Make sure each will be different key means they are different. So same will go to same reducer. B) for landmark data. If the compared object is another landmark, just compare their altitude and latitude, so different landmarks will omit as different, but if the compared object is hotel, and if 2 pair of altitude and latitude) are within 1 km, and mark these two objects are the same. So this landmark data will treat as the same grouping as the hotel data, and both will be treat as the same key, and sent to the same reducer. All this will just need one Mr job, this interview question is really to check if you are design a custom key, and override the grouping and partition comparator (make sure that if landmark data is within 1 km as hotel data, they will be changing and reside together, so they will go to the same reducer). So after mapper stage, you will have 100 million reducer group key, as you have 100 million hotel data, but all the landmark data within 1 km of the hotel will be treated as the same key of that hotel, then being sent to the same reducer. Hope I explain clearly in I English as Chinese typing is too slow for me.

Tntzbzc11:02 02-28
Grade The 9 floor

[quote= quoted 7 floor sanguomi reply: " My initial idea is that if the 2 billion data and 2 million data, if not done in the region, The least complexity is O (2000000 * 2000000000). And reduce can only have one, which is certainly slow to deal with If it's what I do, 2 billion data, The first step MR sampling calculation according to the coordinates of X, or y to 100 or 1000 Get a relatively average of 100 x or Y coordinate values Second step MR take these points of the coordinates, the 2 billion coordinates and 200W coordinates are also divided into the corresponding 100 or 1000 Here we have to consider a boundary issue, taking into account the limits of the value of some hotels in the coordinates of the point of our point X/Y value So in the coordinates of the partitions of the data, the first block of the actual area is divided to x plus one kilometer, likewise the second quickest region cut down a km (calculated in the map, the two regions to the transmission of a data), hotel zone directly according to the sampling point to can, output key for value corresponding to the sampling point X or Y, data coordinate The third step Mr marker for two file Hotel tag 0 coordinate data of the tag is a combination of sampling points as the key, then reduce end for aggregation operations, coordinates hotel first, stored in containers, later the coordinates of each data are calculated and containers to preserve the hotel coordinate distance calculation, meet the output. If it is 1000 parts, then the basic is O (2000000 * 2000 * 1000) Fourth step MR merger It's not the point to say that it's still wet. [/quote] Thank you so much for your sharing. 7 and 8 floor plan and I have a common point is that the hotel / landmark Y and [b] X divided by a coefficient [/b] (coefficient = radius 1000), this will be able to achieve dimensionality reduction But I don't quite understand after dimension reduction, dimension, you get the algorithm complexity is O (2000000 * 2000 * 1000) According to the second methods I mentioned before Landmark input map = 2000000000 Landmark in map dimension after input reduce = 2000000000/1000 = 2000000 Hotel input map = 1000000 * 9 Hotel Map reduce input = (1000000/1000) =1000*9 *9 Finally such a result is not should be more reasonable 2000000 * 1000* 9

Sanguomi11:30 02-28
Grade The 10 floor

Back to the big wet You do this, is not good to do aggregation operation, such as that coordinate data no you hotel center coordinates to coordinates that, migration of a little bit, in addition to a magnitude of 8 regional processing is not simple.

Tntzbzc11:31 02-28
Grade The 11 floor

[quote= quoted 8 floor java8964 reply: " Is an interested question. This Here is my thoughts. 1) only need one MR job. We 2) the focus is how to design a custom key class, which will sort the data as the way we want. Remember, the MapReduce job, almost every key will be compared with each other to grouping/partition and sorted. this is the key feature provided from Hadoop, and we should utilize. Input of mapper will be (LongWritable, Text), default TextInputFormat as, the output of the mapper will be <KeyClass NullWriteable (). The but (). Most important is to The design this custom KeyClass. Are some important information Here related this key class I, will use pseudo code: Key Class { Type enum Double altitude Double latitude } This mapper In we have to have the custom SortComparator GroupComparator and PartitionComparator. Here are the trick parts of each:. 1) sort comparator: For Same type compare, altitude and latitude. For Different type For make sure all is less than s that will sort first then. 2) grouping and Partition comparator this, is very important part. For A Make sure each will be different key means they are different. So same will go to same reducer. B) for landmark data. If the compared object is another landmark, just compare their altitude and latitude, so different landmarks will omit as different, but if the compared object is hotel, and if 2 pair of altitude and latitude) are within 1 km, and mark these two objects are the same. So this landmark data will treat as the same grouping as the hotel data, and both will be treat as the same key, and sent to the same reducer. All this will just need one Mr job, this interview question is really to check if you are design a custom key, and override the grouping and partition comparator (make sure that if landmark data is within 1 km as hotel data, they will be changing and reside together, so they will go to the same reducer). So after mapper stage, you will have 100 million reducer group key, as you have 100 million hotel data, but all the landmark data within 1 km of the hotel will be treated as the same key of that hotel, then being sent to the same reducer. 我希望我能用英文解释清楚，因为我的打字速度太慢了。 我认为这将是一个完美的解决方案 但我不知道如何使用一个[或]先生的工作 当最后一个输出格式减少： 酒店99地标31 酒店2地标1034 酒店1地标41 酒店322地标9 酒店12地标199 ...... ...... 酒店98地标199122 酒店998地标122 酒店3地标223 最后的输出是不按酒店排序的，我认为一个先生的工作是可以的 但如果最后的输出需要对酒店数据进行排序，我认为这需要2份工作 酒店1地标1 酒店1地标4 酒店1地标12 酒店1地标31 酒店2地标455 酒店2地标1525 ..... ..... 酒店N-1地标1211 酒店N地标242323 酒店N地标4563432 【B】B）为地标数据，如果比较的对象是另一个地标，只是比较它们的海拔高度和纬度，所以不同的地标将省略不同，但如果比较的对象是酒店，如果2对（纬度）在1公里，标志这2个对象都是相同的。所以这地标数据将视为同一分组的酒店数据，都将被视为相同的密钥，并发送到同一个减速机。[/b] 如果我的想法是错误的，这个解释是关键 你想展示你的这种解释的代码吗 谢谢

tntzbzc02-28 11:36

[报价=引用10楼sanguomi的回复：] 回大湿 你这样做，不好做聚合操作吧，比如那种坐标数据没有你酒店圆心坐标来作圆心坐标的那种，偏移了一点点，另外还有8个幅度的区域处理也不简单啊[引用] 可以做的，那8个增点，一定覆盖所有圆内地标，而且只会多覆盖，不会少覆盖 被多覆盖的地标，需要在减少阶段剔除 你看我这张图 【img = HTTP：/ / prog3。COM /深圳/ IMG。论坛/上传/ 201402 / 28 / 1393558537_656758 .jpg ] [图片] 正方形的完全覆盖了原，而且那四个角还多出来一些面积

sbwwkmyd02-28 12:48

nettman02-28 13:22

【img = HTTP：/ / prog3。COM /深圳/论坛/ pointforum /用户界面/脚本/编程/插件/ 003 /猴子/ 1。GIF ] [下]

u01381916902-28 13:31

[网址= HTTP：/ / prog3。COM /深圳/ jcloud /米/作/作/ product_type = 10和project_id = 612和opus_zone？ID = 1759 ] [ IMG = HTTP：/ / prog3。COM /深圳/论坛/ pointforum /用户界面/脚本/编程/插件/ 001 /面/ 79。GIF ] [下] [网址]

u01381969602-28 13:35

[报价=引用楼主kissmelove01的回复：] 现有100万酒店坐标和20亿地标，里面记录地标的经纬度，请设计MapReduce计算所有酒店1公里范围内的地标[引用]。 感觉很模糊，没给我文件格式，难道我还去设计个格式？要有高效率必须建立在有规律的数据上。 如果按通常格式来说，可以借鉴下grep的实现方式[ IMG = HTTP：/ / prog3。COM /深圳/论坛/ pointforum /用户界面/脚本/编程/插件/ 003 /洋葱/ 2。GIF ] [下]

vbplayboy02-28 15:07

6个字符】【凑满函数式？

lkf18102-28 23:08

u01384328703-01 17:33

03-01 18:41

lwwenlong03-02 20:29

[报价=引用11楼tntzbzc的回复：] [报价=引用8楼java8964的回复：] 这是一个感兴趣的问题。这是我的想法。 1）我们只需要一份工作。 2）重点是如何设计一个自定义键，将数据作为我们想要的方式排序。记住，在MapReduce的工作，几乎每一个关键会互相比较，分组/分区和排序。这是从Hadoop的重要特征，我们应该利用。 映射器的输入将（longwritable，文本），默认textinputformat，但成像仪的输出将<<关键类，nullwriteable）。 最重要的是设计这一习俗关键类。 这里有一些重要的信息相关的这个关键类，我会用伪代码： 类键{ 枚举类型 高度加倍 纬度双 } 在这个映射，我们必须习惯sortcomparator，groupcomparator和partitioncomparator。这里是每个部分的伎俩： 1）排序比较器： 同类型，比较高度和纬度。 对于不同的类型，确保所有酒店小于地标，将酒店排序第一，然后地标。 2）用于分组和分区比较，这是非常重要的部分。 a）确保每个酒店将不同的关键，意味着他们是不同的。所以同样的酒店会去同样的减速器。 B）为地标数据，如果比较的对象是另一个地标，只是比较它们的海拔高度和纬度，所以不同的地标将省略不同，但如果比较的对象是酒店，如果2对（纬度）在1公里，标志这2个对象都是相同的。所以这地标数据将视为同一分组的酒店数据，都将被视为相同的密钥，并发送到相同的减速器。 所有这一切都只需要一个乔布斯，这个面试问题真是检查如果你设计一个自定义键，并重写分组和分区比较器（确保如果地标数据是在1公里作为酒店数据，他们将分组分在一起，所以他们会去相同的减速器）。所以在映射阶段，你将有100万减速器组密钥，当你有100万酒店数据，但所有的地标数据在1公里的酒店将被视为是酒店相同的密钥，然后被发送到相同的减速器。 我希望我能用英文解释清楚，因为我的打字速度太慢了。 我认为这将是一个完美的解决方案 但我不知道如何使用一个[或]先生的工作 当最后一个输出格式减少： 酒店99地标31 酒店2地标1034 酒店1地标41 酒店322地标9 酒店12地标199 ...... ...... 酒店98地标199122 酒店998地标122 酒店3地标223 最后的输出是不按酒店排序的，我认为一个先生的工作是可以的 但如果最后的输出需要对酒店数据进行排序，我认为这需要2份工作 酒店1地标1 酒店1地标4 酒店1地标12 酒店1地标31 酒店2地标455 酒店2地标1525 ..... ..... 酒店N-1地标1211 酒店N地标242323 酒店N地标4563432 【B】B）为地标数据，如果比较的对象是另一个地标，只是比较它们的海拔高度和纬度，所以不同的地标将省略不同，但如果比较的对象是酒店，如果2对（纬度）在1公里，标志这2个对象都是相同的。所以这地标数据将视为同一分组的酒店数据，都将被视为相同的密钥，并发送到同一个减速机。[/b] 如果我的想法是错误的，这个解释是关键 你想展示你的这种解释的代码吗 谢谢[ /报价] 怎么设计分割类呢？？？？

u01170299303-03 10:51

【img = HTTP：/ / prog3。COM /深圳/论坛/ pointforum /用户界面/脚本/编程/插件/ 003 /猴子/ 25。GIF ] [下]

u01306123603-03 11:43

【img = HTTP：/ / prog3。COM /深圳/论坛/ pointforum /用户界面/脚本/编程/插件/ 003 /猴子/ 34。GIF ] [下] 我是来学习的

chouy03-03 14:24

ftjavayp03-03 15:11

【img = HTTP：/ / prog3。COM /深圳/论坛/ pointforum /用户界面/脚本/编程/插件/ 003 /猴子/ 9。GIF ] [下]

tntzbzc03-03 20:46

sanguomi03-03 23:03

sanguomi03-03 23:10

tntzbzc03-04 10:26

[报价=引用29楼sanguomi的回复：] 大湿的代码应该没什么问题了 大概是使对应的酒店的地标数据8个方位各自产生一条数据，作用是利用这8条数据分区到对应的地标数据中，使每个酒店都能匹配到一个正方形内的地标数据。 不知道描述的是否正确.. [引用] [报价=引用30楼sanguomi的回复：] 其实是原来的1份酒店地标变成9份，然后和所有地标数据做[引用]加入 对的

sanguomi03-04 10:30

tntzbzc03-04 10:44

[报价=引用32楼sanguomi的回复：] 刚刚测试发现了问题。 [代码] 为（文本×：值）{ 如果（计数= = 0和关键。gettype()！= 0）{ 系统，println（“无”）； 标签=真； 系统的输入（字符串值（关键。getx()）+“”+字符串值（关键。gety()））； 系统，println（上下文）； //break; } [/code] Take these print out of the coordinate data Directly in the map and the calculation of each hotel data, found that there is in line with the needs of 1 km. [code=java] Dis = Math.sqrt (Math.pow (hotel.getX) - 17704, 2) + Math.pow (hotel.getY () - 25787, 2); If (DIS < = 1000) { System.out.println ("*"); } [/code] It seems that there should be a problem when grouping iteration, [/quote] Read the letter

Sanguomi12:31 03-04
Grade The 34 floor

[quote= quoted 33 floor tntzbzc reply: " [quote= quoted 32 floor sanguomi reply: " Has just been tested and found the problem. [code=java] For (x Text: value) { If (count = = 0 & & key.getType (0) = {!) System.out.println ("NO"); Tag = true; System.out.println (String.valueOf (key.getX) + "" + String.valueOf (key.getY ()); System.out.println (context); //break; } [/code] Take these print out of the coordinate data Directly in the map and the calculation of each hotel data, found that there is in line with the needs of 1 km. [code=java] Dis = Math.sqrt (Math.pow (hotel.getX) - 17704, 2) + Math.pow (hotel.getY () - 25787, 2); If (DIS < = 1000) { System.out.println ("*"); } [/code] It seems that there should be a problem when grouping iteration, [/quote] Read the letter [/quote] Sorry, the test data copy is wrong.

U01348925406:34 03-05
Grade The 35 floor

The output of the latitude and longitude into key can effectively reduce the burden of reducer Two points on the map is less than 1 km distance, then the value of certain latitude and longitude in a certain precision range of the same (for example, a decimal point) is actually fuzzy take a square to reduce the amount of computation For example, a ID latitude and longitude 37.7833 degrees N, 122.4167 degrees mapper key W output is "N0037.7, W0122.4" or "N0037.7" + two key "W0122.4"" So that each reducer the amount of data is very small, directly determine the range of the hotel and the landmark distance and then the output of qualified pair or map on the line

U01388452308:59 03-05
Grade The 36 floor

[quote= quoted 35 floor u013489254 reply: " The output of the latitude and longitude into key can effectively reduce the burden of reducer Two points on the map is less than 1 km distance, then the value of certain latitude and longitude in a certain precision range of the same (for example, a decimal point) is actually fuzzy take a square to reduce the amount of computation For example, a ID latitude and longitude 37.7833 degrees N, 122.4167 degrees mapper key W output is "N0037.7, W0122.4" or "N0037.7" + two key "W0122.4"" So that each reducer has a small amount of data is very small, directly determine the range of the hotel and the landmark distance and then the output of qualified pair or map on the line [/quote] Too idealistic! Assume that there are 38.0001w 37.9999n's hotel and 37.9999w 38.0001n's landmark Distance within one kilometer Press your make hotel to become 38, 38 Landmark turns 37.99, 37.99 That's not the way to get them out into a group. And took all the answers. In addition to the 8 floor and moderator of the program, others are less reliable

U01348925409:37 03-05
Grade The 37 floor

[quote= quoted 36 floor u013884523 reply: " Too idealistic! Assume that there are 38.0001w 37.9999n's hotel and 37.9999w 38.0001n's landmark Distance within one kilometer Press your make hotel to become 38, 38 Landmark turns 37.99, 37.99 That's not the way to get them out into a group. [/quote] Probably I did not describe clearly, the distance N can affect the latitude and longitude of the number of K, then the K bit before the latitude and longitude data in five to four homes in the future will not change

U01061151711:38 03-05
Grade The 38 floor

U01343619812:40 03-05
Grade The 39 floor

With the moderator of the code to measure, no problem [quote= quoted 37 floor u013489254 reply: " [quote= quoted 36 floor u013884523 reply: " Too idealistic! Assume that there are 38.0001w 37.9999n's hotel and 37.9999w 38.0001n's landmark Distance within one kilometer Press your make hotel to become 38, 38 Landmark turns 37.99, 37.99 That's not the way to get them out into a group. [/quote] Probably I did not describe clearly, with a distance of N can affect the latitude and longitude of the number of K, then the K bit before the latitude and longitude data in the four homes five into the future is not changing [/quote] 37 floor plan still can not solve the problems raised by the 36 floor

Zhangkai0811115:59 03-05
Grade The 40 floor

100W put the memory is no problem and do not require the output format.. The question is not used to reduce 100W will put the memory before map, and then determine the filter output at map end one by one... If you want to format data, then come to a reduce. According to the hotel group, the output desired format..

U01348925416:21 03-05
Grade The 41 floor

[quote= quoted 39 floor u013436198 reply: " 37 floor plan still can not solve the problems raised by the 36 floor [/quote] Four homes five into the accuracy of the why can not solve the problem? Set the accuracy of K, mathematics can not find a meet the conditions of the K so that as long as five to four into the K after the latitude and longitude of a certain gap between the actual distance of more than 1km?

U01267735115:27 03-06
Grade The 42 floor

Hello, everyone.

Aby91312:17 03-07
Grade The 43 floor

Are master ah come in to learn

Kissmelove0113:24 03-08
Grade The 44 floor

Thank you very much for a few days, this busy interview, did not see.

Java896406:08 03-09
Grade The 45 floor

Kind, of busy in the last Sorry 2 weeks. 我八楼的算法是不对的，当时考虑的不仔细。如果一个地标在超过一个以上的酒店1公里内，原来的方法无法处理，就像23楼的问题，分割做不了，作为一项关键只能发送到一个分区，所以在上面的例子中，它是不可能把这一地标2减速器。 这个问题是一个真正的KNN连接问题（K近邻加入）。如果双方都是大的，它会找到一个最好的算法很难处理。但要记住这是一个面试问题。如果我在面试的基础上提出问题，这将是我的答案： 只有一百万家酒店坐标数据集，使用2倍来表示它将只需要2×8字节×1000000，大约1600万的记忆，作为坐标只是2双类型的值。我将所有的酒店坐标使用Hadoop分布式缓存发送给所有的映射。 该程序只会处理所有的地标数据作为输入，再加上分发酒店缓存数据，所以你可以忽略酒店的ID作为输出的关键，而地标数据如果地标在1公里内为价值。在减速，你得到所有的这家酒店的地标。 我相信这是一个简单的问题，因为最关键的是，我们只有1酒店。当然，如果我们增加数10亿，然后做同样的KNN连接解决方案。：-）

java896403-09 06:25

coolbamboo200803-09 17:10

tntzbzc03-09 22:34

[报价=引用44楼kissmelove01的回复：] 感谢版主，这几天忙着面试，没怎么来看[引用]。 我代码有个小bug，修正了 [代码] hotelpartitioner扩展分区= Java类<<酒店，nullwritable > { public int getpartition（酒店钥匙，nullwritable值，int numpartitions）{ 返回数学。ABS（（int）（（关键。getxa() +键。getya()）% numpartitions））； }； } [ /码] 你的问题很给力，祝你面试成功，如果遇到好的题目，一定要发哦，3Q

sunxing00703-10 15:22

u01182807603-11 17:45

【img = HTTP：/ / prog3。COM /深圳/论坛/ pointforum /用户界面/脚本/编程/插件/ 003 /猴子/ 31。GIF ] [下]

xiaoxiangqing03-12 09:15

u01479816204-29 14:19

sandj_13_1405-07 15:54

xlk2305-09 14:57

liyang41780006-03 16:16

he3636363608-22 16:00

xxzzpp12345611-05 18:07

lessonnair11-12 11:54

u01252311612-11 10:49

Is God class, I this just out of school to see the right? I am very interested in ah, but Hadoop are not getting started ah

Zccaogong12:45 03-12
Grade The 60 floor

Good learning is to analyze and discuss! In research!

Xq_yf14:51 03-18
Grade The 61 floor

Dig down the grave, and in front of the method almost, through the elimination of XY is impossible, plus one: take to determine the scope of the final calculation is not sure. [img=http://prog3.com/sbdm/img.bbs/upload/201503/18/1426661225_45557.jpg][/img]

U01224384611:30 03-31
Grade The 62 floor

[img=http://prog3.com/sbdm/forum/PointForum/ui/scripts/csdn/Plugin/003/monkey/2.gif][/img] Is God, Study hard white..

Baolibin52821:09 04-05
Grade The 63 floor

[img=http://prog3.com/sbdm/forum/PointForum/ui/scripts/csdn/Plugin/003/monkey/20.gif][/img]

Liyang041010:20 06-01
Grade The 64 floor

[img=http://prog3.com/sbdm/forum/PointForum/ui/scripts/csdn/Plugin/003/onion/71.gif][/img] I'm here to learn, and recently I've got a MapReduce,

U01345706518:52 07-02
Grade The 65 floor

This is not at all on one level..

U01125276123:54 07-14
Grade The 66 floor

Say what I think, first calculate the latitude and longitude of the hotel within one kilometer of the range, and then to find out all the landmarks. So the amount of calculation will be greatly reduced.

Jsrlz11:37 07-17
Grade The 67 floor

Are master ah come in to learn

U01304742611:56 07-28
Grade The 68 floor

In the case of no region of the first sub region analysis of the significance of the reality The hotel ID is stored in a regional database ID the longitude of a data array of the latitude of a data array. ID is divided into 100 levels (in accordance with the size of the latitude and longitude) The original data in accordance with latitude and longitude plus one kilometer data search for the corresponding level of latitude to search the corresponding level of latitude Take intersection You can

U01304742612:00 07-28
Grade The 69 floor

Can also be used with 2 KEY a HashMap is the latitude of a KEY is the longitude to find the corresponding interval to take the intersection

One Zero share
Even a small step
Also want to share with you
open