return Sign in

Programmers in February 2016: technical analysis of open source data structure

picture description
By 2015, open source data in the field of lively and extraordinary, in each big ace open-source framework to spare no effort to rich characteristics and improve the stability and performance at the same time, a lot of rising star is more brains and catch up.

HDFS & yarn, Hadoop hale and hearty: when it comes to open source data, Hadoop to mention, of course, nowadays we may pay more attention to the HDFS and yarn. But in the past year, HDFS and YARN also deliver the goods published a highly anticipated characteristics.

Step by step, the upstart spark sweeping potential: spark to blow dry rot pull the potential across the big data circle, 2015 spark the most important decision is to introduce the dataframe API, dataframe with less code, faster speed, quickly won the engineer's favor. Have to say, open big data field in 2015 to Spark.

Docker & kubernetes, large ecological data in a burst of fresh air: by 2015, with the big data together with the development of the eco container, docker and kubernetes rapid development to large data field brought a new fresh air, has greatly promoted the development of SDN and scheduling techniques.

We have sufficient reason and confidence to believe that big data in 2016 will burst out more value, also look forward to domestic engineers to participate in the construction of large data ecological.

The programmer cover story, brought the following practices:

  • Look at the 2015 big data feudal lords vying for the throne, the open source framework iteration (Chen Chao, seven cattle technical director)
  • Jaguar, a length of yarn based on service automatic expansion architecture (Masson, AsiaInfo R & D Manager; Ma Weina, Apache Hadoop resources experts)
  • HDFS EC: erasure coding technology into HDFS (Li Bo, Intel technical experts)
  • The technology of data warehouse based on Hadoop SQL on (Sun Yuanhao, star ring technology founder and CTO)
  • The Spark data source computing practice and its practice in GrowingIO (Tian Yi, GrowingIO Data Platform Engineer)
  • Impala information warehouse: Interpretation of TQueryExecRequest structure (check the data Rui, senior engineering)
  • Spark Streaming practice and optimization (Xu Xin, Hulu Big Data Engineer; Dong Xicheng, distributed computing expert)
  • The challenge and analysis of distributed database (Lu Yilei, vice president of AdMaster Technology)
  • Apache Eagle: distributed real-time performance data and safety monitoring platform (Zhang Yong, eBay senior architect)
  • Big data driven microblogging social recommendation (Jiang Guibin, technical director of sina algorithm)


  • CSDN ten information
  • Express: CACM better memory of Foreign Periodicals
  • Express: IEEE Spectrum how to use the foreign power for the virtual reality glove finger movement
  • In the immediate future, CES 2016 highlights (Zhan Rong, Wei Rongjie, Qi Wen)


Apache Flink (Flink) project is a new star big data processing field recently rising, many characteristics which are different from other big data project has attracted more and more attention. This paper will deeply analyze some key technologies and features of Flink, hoping to help the readers to have a more in-depth understanding of Flink, the other big data system developers can also help. In this paper, assuming that the reader has some understanding of processing architecture of MapReduce, Spark and Storm and other large data stream processing and familiar with the basic concepts and batch processing.

  • The use of reinforcement learning algorithm to enhance the recommendation effect (single art, headhunting network chief data officer)

On the recommendation system, the industry pay more attention to the classic recommendation algorithm (such as collaborative recommendation, matrix decomposition, learning to rank and architecture etc.). In practical applications, users, things and environment changing, it will make the original model of failure. Reinforcement learning (Reinforcement and Learning), as a branch of machine learning, emphasizing how to act based on environment, and maximize the expected benefits. It is longer than line planning, in the exploration (in unknown areas) and use (prior knowledge) to find a balance between. For these challenges, it can provide a practical and effective method. Combined with hunting hire network demand and practice, introduces the application of Multi-Armed bandits help to solve the cold start recommendation system, online experimental strategy and adjustment of excellence is an important problem, and looks forward to enhance learning more use and future development.

  • OpenStack database service Trove analysis and Practice (Huang Mingsheng, CTO and co-founder of green Nebula Technology)

Big data analysis is more and more popular in the background, reliable and convenient management of the database will become increasingly important. The author of this article on the OpenStack principle, the structure and function of Trove in-depth introduction, and through practice to show the application of Trove.

  • The prospect of image Docker 1.10 new face (Sun Hongliang, DaoCloud partner)

The upcoming release of Docker version 1.10, Docker image is one of the biggest surprise -- through a special organization, whether it is the storage efficiency or safety mirror, will reach a new height.

  • Talk about data mining classification problem as an example to predict gender (Wang Qi, friends of the union of advanced data mining engineers)

The rapid development of the Internet, gave birth to the explosive growth of data. The face of massive data, how to tap the value of data, has become an increasingly important issue. This paper first introduces the basic content of data mining, and then according to the data mining processes, to explain a specific data mining task is how to achieve gender prediction examples.

  • To explore and optimize the performance of Swift (Wang Wei, ObjC Chinese project sponsors)

This paper will analyze the considerations and practices on the performance of some Swift for the use of iOS/OS X development at the same time, the author of this year Swift development experience, it also puts forward some corresponding measures.

  • The application of LBS display technology (Jia Shuangcheng, Alibaba senior engineer)

Almost every intelligent mobile phone is equipped with LBS applications, but the technology of covering a wide range, there are few in-depth description of LBS technology information. This paper will introduce the application of LBS display technology.

This paper mainly introduces the booking site in business development process encountered MySQL main base mount tens or even hundreds of 100 from the library to explore the solution: the use of server Binlog. Binlog server can solve more than fifty from libraries main library network bandwidth restrictions, and avoid the shortcomings of traditional cascading replication scheme. It also describes the use of server Binlog can also used for remote room copying and reconstructing topology optimization of main fault library of recombinant. The author explores the problem of gradual way and ideas worth learning.

The database table depots from the Internet era has opened, has been a hot topic. In the NoSQL run today, a relational database with its stable, flexible and compatible query features, is still most companies as the preferred database. Therefore, reasonable use of depots table technology to deal with massive data and high concurrent impact on database is the major Internet companies is inevitable.

  • High availability system in practice and experience of public comment (Chen Yifang, public comment on the trading platform technology team leader)

This paper mainly reviews the evolution of the trading system is mainly to describe how to achieve high availability, combined with their own experience to do something to share. High availability is only a result, more attention should be paid to the iterative process, focusing on business development.

Subscribe to 2016 programmers (including iPad, Android and print version) please visitHttp://

Subscription consulting:

- online consulting (QQ): 2251809102
- Telephone: 010-64351436
- mail Advisory: reader@PROG3.COM
- more information, welcome attention"The programmer of editorial department"