YARN & Mesos, the challenges faced by cluster resource management
Published in23:17 2015-07-02|
At home, most of the Spark users are from the Hadoop transition, so YARN has become the underlying resource scheduling security of most Spark applications. With the gradual deepening of the application of Spark, a variety of problems are exposed, such as the size of the resource scheduling problem. To this end, the evening of July 2nd, at CSDN Spark high-end micro channel group, a discussion based on YARN and Mesos is pulled open, mainly involved in the sharing of guests includeYan Zhitao, vice president of R & D GrowingIO, TalkingData Tian Yi, vice president of AdMaster technology Lu billion Lei, Spark Mesos/Hadoop, Contributor Committer Xia Junluan, together with a review of the following.
Yan Zhitao - YARN and Hadoop binding and resource allocation granularity problem
Here a lot of old friends, here is the main talk about the practice of on YARN Spark challenge. Data Spark was originally introduced in 2013, when the main consideration is to improve the efficiency of machine learning, so when 0.8.1 Spark introduced Spark, when the environment is CDH Hadoop 4.3 version.
Initially with Spark is to run some of the basic data mining tasks, other tasks are also used to complete the MR+HIVE. Later found that, compared to Spark, Hadoop has obvious advantages in terms of development and performance, so it began to move the entire data center computing to the Spark platform. More tasks, but also need to run a concurrent task, so you need a resource scheduling system. CDH 4.3 is supported by YARN, while Spark supports YARN, so it is more natural to choose the YARN to do resource scheduling.
The specific approach is divided into different queues, different types of tasks specified by the different queues, so that you can execute different tasks. The first problem encountered is how to divide resources.Resource partitioning in multiple queues is implemented using different resource percentages. The size of the whole resource allocation is not fine enough, but also can be used. As a result Spark 1.2 when the Spark began to declare in the latter part of the big to be abandoned on the support of alpha YARN, and CDH 4.3 of the YARN is the alpha version of.
So the spark 1.3 after facing a choice and post all spark version must modify their own to support CDH 4.3, or upgrade Hadoop to a newer version, CDH 5. X), or using other resource scheduling. However, the moment of the Hadoop cluster has P level data, with data upgrade is a very risky thing. So we began to consider using Mesos to do the scheduling and management of resources. Our plan is CDH 4.3 not to upgrade, new machines are using a new version of the Hadoop, and then use the Mesos to a unified scheduling. In addition, the introduction of Tachyon as a buffer layer, shuffle as the SSD of the floor storage. If we use Mesos scheduling, we will reduce the dependency on the Hadoop version. Hadoop upgrade risk is a bit high. This is one of the biggest pits we have come across. I am here on the YARN Tucao so much, the rest of the use of Spark pit, followed by the opportunity to say it.
Tian Yi - 1.4.0, on YARN classpath Spark problem
Recently met a big, said little not small pit, roughly the case is submitted to the job spark encountered a variety of classpath issues - including the class can not find class and different versions of the conflict. Especially after the upgrade to 1.4.0 spark, running on the YARN often encounter this problem, today is mainly to share with you about the on YARN Spark environment under the classpath. Summed up the Spark in the class on the YARN loading rules for your reference (the following content for the YARN version of client Spark1.4.0 mode).
Spark submitted to the YARN cluster through the job spark-submit, in the case of not modifying the spark related startup script, the following factors determine the spark-submit submission of the task of classpath (may be omitted, please add).
- - driver-class-path
- - jars
This is a very troublesome problem, Spark made so many configurations, the various versions of the loading mechanism is not quite the same, the use of a very headache, to see the specific implementation mechanism of the spark-submit command:
- Execute bin/spark-class
- Execute bin/load-spark-env.sh
- Execute conf/spark-env.sh
- Execute -cp java... Org.apache.spark.launcher.Main
- Generate start command for Driver terminal
The fifth step is only recently changed, before this step is achieved in shell, this change, want to understand the logic can only see the scala source code, for some developers have become black box...... Want to know the details of the process of the students can be added in the spark-class command -x set, by watching the org.apache.spark.launcher.Main code, you can get the order of Driver classpath:
- $SPARK_CLASSPATH (scrap, not recommended)
Configuration - or spark.driver.extraClassPath driver-class-path
Class Executor loading is far more complex than the Driver, I do not here in detail, interested students can go to see the code spark-yarn module. Executor end classpath loading order:
- classpath` `hadoop
This particular need to pay attention to the loading sequence, the wrong order will often lead to different versions of the class package in different jar packages are loaded, resulting in call errors. After understanding the loading sequence, we recommend that you configure the classpath in accordance with the following manner:
On the driver side, use driver-class-path to complete control of end driver classpath, enough to meet the needs; for the executor side. If you use the jars command, to attention and Hadoop and spark-assembly class conflict, if need to prioritize loading, by way of spark.executor.extraClassPath for configuration. A little bit here say a digression, we two days tried Phoenix version 4.4.0, for processing spark dataframe data can be very convenient for loading by Phoenix to HBase. Just need a word:
Hc.sql (SQL) .write .format ("org.apache.phoenix.spark") .mode (SaveMode.Overwrite) .options (Map ("table", "XXX\_TABLE", "zkUrl" - (zookeeper))) (.Save)
The resource management mechanism of YARN, Lu Lei
First look at the two YARN resource management map, one is the RM map, a NodeManage map:
More than 1 types of resource scheduling - mainly using DRF algorithm
2 provide a variety of resource scheduler:
- Scheduler Fair
- Scheduler Capacity
3 multi tenant resource scheduler: resource allocation, proportional division method, hierarchical queue and resource preemption mode
Here to give a pit:
Once discovered that RM can not allocate resources, see the cluster state is normal, CPU, memory, disk, bandwidth is relatively low. But the task allocation is very slow, check the log for a long time after the RM to find a clue:
-22 07:29:15438 WARN org.apache.hadoop.ipc.Server: 2015-04 Large response size for 3552406 call From 10.10.10.239:48535 Call#146914 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications From 10.10.10.239:48535 Call#146914 Retry#0 org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplications
Found to be org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.get this API due to regularly take cluster state, and because of too many historical state of the clusters, resulting in remove each state when the return value is too large, resulting in blocking the RM. So it is recommended that we need to pay special attention to the detection of the cluster state whether the value is too large. In addition, if the cluster has any exception, it is recommended to look at LOG, LOG basically can tell us all things. Next I briefly introduce our Hadoop application scenarios:
We currently owned by the original dozens of machines to now more than 1500 sets of server cluster, needs to be done every day more than 100 billion acquisition request, every day there are hundreds of billions of data offline, streaming, real-time analysis and calculation. So there is a great challenge to our cluster.
From this framework we can find out that we actually use a lot of techniques and systems for the entire Hadoop ecosystem. We will ask why we will use the Flink and Spark together. Hadoop in yesterday's hair Summit 2015 has some simple introduction. Here, first for everyone to disclose what we do a is a test of the K-means, the data still has some surprise.
Flink in addition to compatibility, the performance should be higher than Spark. A detailed analysis of the report next week I will share with you.
Xia Junluan - Mesos characteristics and status quo
In fact, Mesos used more abroad, not too many domestic. From the current Mesos official website to see, is relatively large airbnb and yelp. Mesos in the spark 0.8 version of the time, and standalone almost born together, YARN is almost 1 to be available. Actually in Spark Out when Mesos is much more stable than YARN, but also Berkeley's own things, the strength of the support of a large.
Currently Spark which Mesos and YARN are supported by two scheduling modes, client and cluster. Which Mesos also supports two kinds of coarse and fine intensity mode, the mode of fine dynamics, in the submission of task directly with mesos Master communications, making Spark operations and other frameworks work sharing resources. Of course, also includes other Spark operations, the resource is not exclusive. But the disadvantage of this approach is that the scheduling overhead is relatively large, not suitable for interactive operations. The rough scheduling method is in fact the same as the current YARN, which is conducive to the low delay of the job. I have two models of the test data, the next time you can share. Due to the Hadoop is not in the ecology, Mesos or more tragic.
Q (CSDN user): Spark generated parquet format is generally recommended for each parquet how big?
Tian Yi:This my advice is don't get too big, data (compression before) had better not more than 128M, this is not absolute, to see your column number and compression ratio.
Yan Zhitao:We are all in a few hundred trillion, parquet mainly to see how much you read out. If you read out a lot, the performance is not necessarily good.
Q (CSDN user): the join or reduce data of millions of data is always a task node lost?
Tian Yi:This is a frequent problem, the most common cause of the long time or GC is stuck, resulting in a heartbeat timeout. They can practice reference Intel recently shared on summit GC's tuning. GC problem has been improved in the 1.4 version, such as a large number of data check.
Live & question
1 sweep code to join Spark micro channel discussion group 2.Note: direct scan code to join is full, the follow-up needs to be invited to join. Please sweep this first
2 to join Spark CSDN technology exchange QQ group, group number: 213683328.
3 CSDN high-end micro channel group, to take the invitation to join the way, do not fear the high threshold of the micro signal "zhongyineng" or scan the bottom of the two-dimensional code, PS: bring your BIO.
- step on
- Related articles
- Latest report
- Easy to get TB level data, open source GraphLab to break the human figure to calculate the limit value"
- Build highly available MongoDB clusters (up): MongoDB configuration with replica set
- Connected Commerce eBay big data platform practice
- BDTC of PPT (1): IBM, Intel, Hortonworks, eBay and other foreign enterprises combat big data
- Style game programmer: Atlanta geeks use machine learning fun Flappy Bird
- Facebook open source technology John Kenevey:Facebook CTO open source data center
Related popular articles