Blog column>Cloud computing column>Hadoop essentials of interpretation
  • Hadoop essentials of interpretation
  • Interpretation of the Hadoop authoritative guide, combined with the underlying characteristics of Java and data statistics to introduce the core idea of Hadoop
  • Subscribe

Updated articles

Cloud computing In Action2.2- mahout clustering introduction -K-means clustering algorithm

Clustering introduction This chapter includes 1 combat operations to understand the cluster 2 to understand the concepts of similarity 3 use mahout to run a simple clustering example 4 for distance calculation method of various clustering As human beings, we tend to cooperate with like-minded people - "birds of a feather together. We are able to discover the patterns of repetition through which we see, hear, ask, and taste the East in our memory. West. For example, compared to us more salt, sugar...

Hadoop

Cloud computing In Action2.2- mahout to the user recommended books (3) - Evaluation of the recommended system

Recommendation system engine is a tool, a means of answering questions, "what is the best recommendation for users?" , study the question before you answer it. What is the exact meaning of a good recommendation? How do I know how to generate a recommendation system? The following section will explore the evaluation of the recommender system, which will be a useful tool in the search for a specific recommendation system. The best recommendation system is in the field of psychology, it is known that you know exactly what you haven't read before you do something, or that you don't have any of the things you like about item, and what you're doing with those item...

Cloud computing RPC Hadoop

Procedure Call Remote remote method call. Without the need to know the details of the network, a program can use the protocol to request a service from another and its programs within the network. It is a Client/Server structure that provides services to a party called Server, a party called the consumer service called Client. The bottom of the Hadoop interaction is carried out through the rpc. Such as: datanode and namenode, tasktracker and jobtracker...

Hadoop MapReduce

Cloud computing MapReduce in-depth understanding of input and output formats (2) - complete summary of input and output

MapReduce is too high, the performance is also worth considering, we are interested in or look at spark is better. FileInputFormat class All using the file as data source inputformat implementation of the base class is FileInputFormat. It provides two functions: a definition which files contained in the input of a job; a as input file generated piecewise implementation, the slice cut into recording is completed by its subclasses. The following figure is the level of the InputFormat class...

MapReduce Big data Hadoop

Cloud computing MapReduce in-depth understanding of input and output formats (1) - input and recording

Input a input slice in put split is that it can be a single map operation. Each map operation only processing an input patch, and deal with the records of each one by one, also is a key / value pairs. Input and recording are logical, and it is not necessary to correspond to the file (although this is generally the case). In the database. An input patch can be a table of a number of rows, and a record is the number of lines in the line (in fact DBlnputFormat is so It is a...

Cloud computing Overview of Hadoop architecture

Hadoop is an open source software framework, which is a software that uses commercial hardware to process and store large data. There are mainly five main components from below: Cluster, is a set of hosts (nodes) composed of. Node can be divided by rack. This is the architecture of the hardware level. Yarn structure (now also a resource manager) is a responsible for providing meter application execution time of the computational resources needed to frame (that is, CPU, memory, etc.). Two important parts are as follows: A resource manager (each cluster a)...

Hadoop cluster Hadoop

Cloud computing MapReduce graphical process to solve the detailed (2) - [map phase]

Connect to an explanation: http://prog3.com/sbdm/blog/mrcharles/article/details/50465626 How many reduce tasks do you have? A number of job ReduceTasks is through the mapreduce.job.reduces configuration parameter settings An output tuple of the segmentation index is how much? The partition index of the output tuple refers to the partition index. In Map...

MapReduce Hadoop cluster Big data Hadoop

MapReduce graphical process to solve the detailed (1) - [map phase]

In MapReduce, a yarn application called a job. MapReduce framework provides application and master a realization is called MRAppMaster Time line of Job MapReduce MapReduce Job running time line: Map Phase: Map Tasks several executed Reduce Phase: Reduce Tasks executed several Reduce...

Hadoop MapReduce

Cloud computing MapReduce flow diagram

Of a MapReduce Job Anatomy In MapReduce, the implementation of a yarn application is called a Job. of the application master provided by the MapReduce Framework is called MRAppMaster. Timeline...

Big data MapReduce

Cloud computing What is the function of combine, partition and shuffle in MapReduce

Http://www.aboutyun.com/thread-8927-1-1.html Mapreduce in Hadoop is a relatively difficult concept. The following need to look at, and then they can be summed up. Generalization Combine and partition are functions, the intermediate steps should be only shuffle! 1.combine Combine is divided into map and reduce terminal, is the same key key to merge in...

MapReduce Hadoop

Cloud computing Type and format of MapReduce [to write the most simple MapReduce] (1)

Map and reduce function Hadoop MapReduce in the following form Map: (K1, V1) - list (K2, V2) Reduce: (K2, list (V2)) - list (K3, V3) Can be seen from the source code, why is such a type: Map: (K1, V1) - list (K2, V2) Reduce: (K2, list (V2)) - list (K3),...

Hadoop Big data MapReduce

Cloud computing Spark and Hadoop CCA developer certification point of 2016 for Hadoop to reach the peak

Skills Required Skills required: Ingest Data Data digestion: Skills to transfer The data between external systems and your cluster. This includes the following: Skills for transferring data between external systems and clusters, including the following: Data from a M Import...

Added MapReduce API Java

Http://book.51cto.com/art/201106/269647.htm Hadoop version of the 0.20.0 package contains a new MapReduce API Java, sometimes referred to as the "context object" (object context), designed to make API easier to extend in the future. The new API is not compatible with the previous API on the type, so you need to override the previous application to make the new API play a role. Added API and old API...

Cloud computing The Hadoop IO data structure based on detailed file storage strategy [column type and line data structure]

For some applications, a special data structure is needed to store data. To run a MapReduce based process, each binary data block is put into its own file, so that it is not easy to extend, so Hadoop has developed a series of advanced containers for this purpose. We can imagine, MapReduce encountered files may log files, text files, and so on, the MapReduce after the split into a section of the data record, into IntWritable, text and so on can serialize the object and serialization to output to network or disk, each type of output will add their own files, this is very economic, because we expect is that all the data can be a container is the best, then ha...

Big data Hadoop File based data structure Column type and line data structure

Cloud computing Hadoop Serialization - Hadoop serialization Xiangjie (3) [ObjectWritable, Writable] and set a custom Writable

Look ahead: This article describes the ObjectWritable, a collection of Writable as well as a custom Writable TextPair Review: In front of the Hadoop itself to understand the basic types of Java to support serialization, and provide the corresponding package implementation class: This does not contain all of the Java data types, such as the object we want to serialization is the Object type, or is commonly used in the collection type map, list that how to do? Don't be afraid, we also provide the corresponding hadoop...

Big data Hadoop serialize Writable HadoopIO

Cloud computing Hadoop Serialization - Hadoop serialization Xiangjie (2) [Text], BytesWritable, NullWritable

Review: In fact, the structure of the original book review serialized, very clearly, I give the structure of chapters in the book: Serialization of the main, the bottom is writable interface implementation. Wiritable provisions of the reading and writing rules of the game (void write (dataoutput out throws IOException; void readFields (datainput in throws IOException). In order to adapt to the MAPR of hadoop...

Big data Hadoop HadoopIO serialize Writable

Cloud computing Hadoop Serialization Hadoop serialization Xiangjie (latest version) (1) [Java and Hadoop] order comparison and writable interface

Beginner java people are sure to remember the Java serialization. Most people don't start to understand the meaning of serialization. This is because a lot of people or the bottom of the Java characteristics is not particularly understand, when you experience a more profound understanding of Java, you will find the essence of the serialization of this thing. Before talking about Hadoop serialization, we'll look at the Java serialization, but also the bottom of the serialization: Class is a very important concept in object-oriented programming. The so-called "class", it can be imagined as a building...

Hadoop Java Object-oriented object

Cloud computing Hadoop encoding and decoding compression decompression mechanism [] (1).

Think about it, when you need to process the data of 500TB, the first thing you need to do is to store it. Do you choose the source file storage? Or to deal with compression and storage? It's clear that encoding processing is a must. A freshly caught 60 minute original video may reach 2G, compressed can be reduced to about 500MB, a SLR photographs may 5 MB, after compression only 400KB, and quality will not occur obvious loss. Hadoop faces the same situation, a large number of data need to be stored in disk or memory, compression is a...

Hadoop Java Code compress data compression

Cloud computing Hadoop IO detailed characteristics (2) [file] check

This article cited the microheart, ggjucheng some of the information, in this thank you. Charles feel that knowledge is priceless, open source sharing is priceless This time we then analysis file IO check code, see the bottom is how to achieve such a large data set of file verification, have to say that the programmer in the design of this system is in the world most has the wisdom of a group of people, in the face of complex and difficult problems, can always find a good solution. In fact, for the file check this thing, Hadoop why an important article about several aspects mentioned in the bit...

Big data Hadoop Java Code HadoopIO

  • Columnist:MrCharles
  • Creation time: 2015-12-23
  • Article number: 29
  • Browse volume: 9043 times

Latest comments