Subscribe cloud computing RSS CSDN home> cloud computing

CD products will be the United States research center Guo Anqi: 2015 Summit Hadoop knowledge

Published in16:28 2015-06-17| Time reading| sourceDong teachers know almost columns| ZeroArticle comments| authorGuo Anqi

Abstract:Hortonwork, cloudera, sap, IBM, Hewlett Packard, Yahoo, and so 25 + data technology service providers, Schlumberger, Verizon, Disney, airbnb, Symantec, Aetna, leader in the from all walks of life, Hadoop prestige still.

[editor's by 2015 the Hadoop summit, more than 25 well-known big data players show their design, including the leader in Schlumberger, Verizon, Disney, airbnb, Symantec, Aetna, all walks of life to share their actual combat experience. It is obvious that, in the case of a variety of large data computing framework, Hadoop is still a large scale does not belong to the production environment. The original title of the article "the world is drunk in the data".

The following is the original text

June 2015 9-11, I attended the eighth session of the global Hadoop Technology Summit (Summit Hadoop 2015) in California, United States of America (). In just 3 days time me insight into the Hortonwork, cloudera, sap, IBM, Hewlett Packard, Yahoo, and so 25 + data technology service providers around big data, design and development of products, also listened to the Schlumberger (energy giant, Verizon (communications giant), Disney (entertainment music giant), airbnb (shared economic, on behalf of the enterprise), Symantec (information security giant), Aetna (medical insurance giant) the from all walks of life leading enterprise product data for the company to create value the real case. My biggest feeling is that there are so many companies believe that the value of the data, and the data as an important asset to the company's important assets to maintain and use. Simply use the summit a guest, Microsoft vice president in charge of data platform Ranga language summary - "the world is immersed in the data" (world is data The drunk on)

Fig 1:The eighth GlobalHadoopTechnology Summit(Summit Hadoop 2015)The venue.

What is Hadoop?

Since I attended the Hadoop summit's circle of friends, it was guy with all sorts of "cool" and "useful" comment spam. But it's a difficult question: how to explain "what is Hadoop" to my mother. This problem is probably a bit like to explain to the programmer CL red shoes in the end which is the same as hard. As a contact with a Hadoop in the new, I also still groping Road, but luckily we have dear Wikipedia, in the above, Hadoop definition is: a written using java language for large sets of data, distributed storage and computing software frame. In simple terms, this is an open source software in the field of computer, any program developer can see it's source code, and the compiler. Its emergence of large data storage and processing quickly become much faster, but also a lot cheaper.

2:Hadoop Summit Fig 2015Keynote speechCEO Rob HortonworkintroduceHadoopMarket share of technology in enterprise applications

Hadoop is how to do the big data storage and processing becomes fast and cheap?

It can speak for three days and nights. But cite a simple example, now need to count a library has many books, one number is certainly very slowly and need a lot number, and the best each with a book have 2 - 3 a number again, so the statistics, the number of is more accurate. So there is a need to have a mechanism to divide the area, the provisions of each person in charge of the number of which a few areas of the book, so that even if there is a sick man will not affect the completion of the overall statistical work. The person here is a Hadoop control of a personal computer, the mechanism is the core of MapReduce Hadoop method. In my opinion, the Hadoop distributed computing functions as a shrewd capitalist design assignment system, both to ensure the complete not particularly dependent on one person, and guarantees that if the workload has increased only need to hire a worker will be able to solve the problem.

3:Hadoop Summit Fig 2015Vice president of Microsoft data platformRangaKeynote address, presentation, retail, health care, payment, education, machine maintenance and transportationHadoopTechnical data dividend(Dividend Data)

What is the Hadoop Technology Summit (Summit Hadoop 2015)?

To explain the problem, we must first explain the importance of Hadoop for enterprises. ForresterResearch (a famous consulting company), chief analyst at Gualtieri Mike in the summit predicted that 100% of the big companies have or will start using Hadoop in the next 2-3 years. No matter you are energy, communications, medical, entertainment, manufacturing, companies in the Internet industry, your data will always more and more, and if you need to dig out the value of the from these massive data and improve the overall competitiveness of the enterprise, you need a powerful storage and data capacity, Hadoop and pan ecosystem can help you achieve! (it's really not advertising. Hadoop technology summit is the Hadoop developers and users of the exchange of places. The three-day summit, during the more than 160 lectures, from Aetna, Facebook, Google, Microsoft, Disney, such as airbnb company's various technical Montana to share their story is developed using Hadoop. Through lectures, free discussion, dinner, party and other forms of participants will communicate with participants from 39 countries of the 4000+. In a sense, Summit Hadoop is like a religious event, the faithful data lovers gather together to see what I'm doing, what I'm doing, and discuss the beliefs about the data.

4:Hadoop Summit Fig 2015The first day of the afternoon lecture schedule screenshot

How is the new technology to maximize the harvest of a technology summit?

In a particular industry summit of the most important is, of course, understand industry trends, have what new concepts, so slowly that you said in the jargon. Which is prepared for working as usual, like me, last year participated in InfoQ qcon (Global architect Congress), the content of the General Assembly probably understand 30%, after a year of study and look it up in a dictionary (wiki), to attend the summit, Hadoop I can probably understand 50%, and communicate with the still managed to a sentence take a sentence more in-depth discussion.

Of course, to ask a good question is also an effective means to maximize the benefits of the summit. My interest mainly in terms of products, so listen to the is Hadoop technology in different companies use case (scenario), at the same time, I also summed up the new technology should be how to experience the fun of this type of Technology Summit, hereby share:

  1. New concept
  2. Ecosystem around Hadoop
  3. people

These are I think as a master of knowledge is also not comprehensive enough technical newcomer at the summit should put place: look at the industry what are the new concept and fill in their own professional dictionary; awareness and understanding of the issues surrounding the development of upstream and downstream a technology provider, who in what kind of software development, who in why software like Bill, which helped to new technology in product design is the concept of a global; finally, the most important is the most likely to achieve, is in contact people in attendance. Everyone is to pay the expensive tickets come to participate in the summit of the personage inside course of study $900+, so each people all like a gold mine, all have their areas of expertise, will encounter similar problems, open communication might let has been plagued by a problem for you to find a new solution. In addition, peer communication can make you do not feel lonely, a lot of practice you want to try may be able to find confidence in the implementation of data feedback from other companies.

5:Hadoop Summit Fig 2015Participants in the white board(Posting Board Job)Before viewing the relevantHadoopTechnology jobs: YAHOO is hiring! Apple is hiring!UberIn recruit!... .Our only product will be the United States research and development center in the United States also recruit senior data scientist yo!

Here are some of my gains in these three areas:

New concept

Concept one: "big iron meets big data" (Iron Meets Big Data Big)

This sentence is general electric is responsible for the software's chief information officer Vince proposed at the summit, summed up the era of big data and Internet of things (the first phase of the main industry in the Internet of things) will complement each other. This from the list of the participants of the company also can see 12: health care, energy, machinery, communication, these traditional industries have beautifully, introduce their attempt in large data / Internet. Of course, from my discussions and participants, the current big data of these traditional companies are still limited to collect data through the sensor data and then do the data analysis, the future development is still very long.

6:Hadoop Summit Fig 2015Ge Chief Information OfficerVinceThe topic mentioned in the topic of the Internet of things amazing data,HadoopTechnology with the Internet of things will open unlimited value: toTwo thousand and twentySo far in the worldTwo hundred and fortyMillion sets of equipment involved in the Internet of things,Ninety-sixPercent of business leaders show in the nextThreeIn the year to test the water to the Internet of things, toTwo thousand and twenty-twoThe Internet of things will reachFourteen point fourTrillions of dollars in the market. amongSevenThe main use of the scene is: smart factory, marketing, smart batteries, game entertainment, smart buildings, commercial ground transportation, medical.

Concept two: "the world is immersed in the data" (world is drunk on data The)

This concept is closely related to the "data Lake" (Lake Data). Data lake is a relatively young concepts, before it is accepted is "data supermarket" (datamart), means in the enterprise data (water) like bottled water as the filtration and disinfection package for each department. And the corresponding, data of the lake is a primitive data of polymerization, those without after treatment of the data will be lost to a container, only when needed, the data from Lake take use and processing. The flow of the upper and lower reaches of the lake is currently the focus of investment in the development of software development. A similar concept as "data swamp".

Other frequently mentioned words, welcome to their own Encyclopedia

Governance Data, Lineage Data, Dividend Data, wrangling Data

Ecosystem around Hadoop

Dong Fei in his articleBig data architecture in the post Hadoop EraThe ecological system around Hadoop is introduced in detail in this paper. My general feeling is the original for so many (at least 30 companies, data and Hadoop is they rely on the production of resources and tools, if data such as water (refer to section data Lake concept), I saw at least a water exploration, drilling, fetching water, teach people to fetch water, teach saving people in the company, company of water disinfection and report to the water company. The key is "water" and "teach people to draw" the company (hortonworks) is listed!

, of course, is said to point to a specific technology, spark is hotly debated a technology, from the venue full can be seen on the interest of all, Apache drill is may 2015 released a new based on Hadoop, the open-source technology, the earliest origins in Google's Dremel system. Its main advantage is can make people realize for distributed data interactive real-time data analysis; airbnb also developed their own a set of open source workflow management platform, the airflow, attracted a lot of concern to the industry.

7:Hadoop Summit Fig 2015Disney data platform development senior engineerCalebIntroduction to the famous magic Bracelet"Bend Magic"TheHadoopframe


This meeting in the party time to know the people of Hortonwork and Cloudera, and finally understand the two is the relationship between the competitors. In listening to the lecture when met a lovely India's brother, a company to hire he specially sent to him to the Hadoop Summit for tickets. At noon to eat at the time to see my side of a white haired grandfather a person eating alone, talk about a bit, he mainly to help the general electric engine system optimization. We have encountered the same trouble in dealing with data. Then go home to check, the original founder of a big data startups and chief technology officer. Like this example is very much.

8:Hadoop Summit Fig 2015The second day in the eveningPedro Market SanThere is a grandPartyActivities, all attendees with badges can participate in. Here is not only free of chargeHadoopSummit of the human tricycle is responsible for the transfer of the venue andPartyThe crowd, there are live band performances, the best is the world's food and wine can taste. Breeze, was drunk.

In addition, I also participated in the "using Hadoop women" (women in Hadoop) activity, in the majority of the people are developed using Hadoop female compatriots, is indeed Hadoop users in a "minority group". Very classic thing is a lecture after the end of the men's room door always waiting in a long queue, and the women's bathroom is no such trouble. Another finding that we are discussing is that none of the 12 keynote speakers were female. However, in the technical lectures, there are many women involved in the lecture, the audience is a lot of lectures very organized, the effect is also very good. This shows that not a female fellow is not suitable for lectures, but we lack the way to understand the lecture, but also the lack of speech to encourage. So in Hadoop Women the purpose of this activity is to focus on the "lady when the self" as the theme to discuss what we can do to help more intelligent women to join the army of Hadoop technology. Participated in the activities of some men, such as the Disney for magic's data development of Caleb. He said he has a daughter, although a small but on the technology very interested in and asked me if I have any good suggestions as a young lady. Finally we by 30 seconds per person self introduction link to know each other, and LinkedIn, I hope in the future career path to mutual assistance.

Summit Hadoop 2015 of the use of Hadoop in Women Hadoop BOF Session PPT one of the pages, thought-provoking.

Text link:The world is lost in the data.

Author: Guo Anqi, graduated from the Department of information engineering, Cornell University in August, in May 2014 to join the United States as an intern for the United States R & D center, is now on the road to the development of data products.

step on