Startup and resource monitoring of DEA component in Foundry Cloud

label Foundry CloudDEAsource code
2327 people read comment(5) Collection report

All applications in Foundry Cloud are run in a component called DEA, the full name of Droplet is Execution Agent DEA.

The main functions of DEA can be divided into two parts: running all applications, monitoring all applications. This article mainly explains how to start an application of Foundry V1 Cloud version of DEA, and how to monitor the use of DEA resources. Although DEA two functions to achieve far more than that, but I believe that the application and monitoring of the application is to monitor the essence of DEA is located, a lot of other content is in these two points on the package or strengthen.

DEA start application

In general, to start an application, first of all need three things: the environment, source code, application portal.

On the environment, Foundry DEA at the Cloud node, has been deployed to complete the application of multiple sets of different frameworks needed to run the environment. On the source and cloud foundry is the concept of a droplet. It is a can run the source code package, than users to upload the source code more add some cloud foundry custom content, for example, for spring applications, cloud foundry will the Tomcat application source code packaged together, and add Tomcat application startup and termination and script. About the program entrance, just already related to the packaging process, the startup script has been added to the droplet. Therefore, DEA only needs a simple environment to execute the startup script, you can successfully start the application. In studying the source, it was found that the DEA in the main file name for agent.rb. At the start of this, also is acts as a proxy for the role, through the underlying Linux script command to complete the operation of the application.

Understanding the message mechanism of the Foundry Cloud developer, will be familiar with the NATS subscription / release mechanism, and DEA is also through this mechanism to receive the "start application" of the request. The subscription code is as follows:

NATS.subscribe ("dea.#{uuid}.start") {process_dea_start MSG (|msg|)}

After news of visible DEA subscription components dea.#{uuid}.start theme, as long as the Nats received the same theme news release, it will forward the message to the DEA, also is the uplink code MSG, DEA accept to MSG by function process DEA start to deal with. Following we detail the implementation of the process_dea_start method.

In the process_dea_start method, first through the JSON class to parse the message message, get a large number of attribute values, and then these attribute values assigned to the variables in the method. Due to the resource monitoring section below, will involve the application of the use of memory, the number of files, hard disk, etc., it is here to this several attributes, for example, the introduction of analytical message.

Message_json = JSON.parse (message)

Limits if = message_json['limits']
MEM = if limits['mem'] limits['mem']
Num_fds = if limits['fds'] limits['fds']
Disk = if limits['disk'] limits['disk']

After parsing the message. In the message JSON out keys to the limits of the value, if it exists, then the limits value to initialize a variable MEM, Num FDS, disk, if does not exist, that is, the use of DEA default value for the variable assignment.

After a series of resource detection, DEA started to create a suitable example of the application needs to start, these examples will ultimately need to be saved in the memory of the DEA block. At the same time, DEA also needs to create a suitable file directory for the application to run the application.

The code for creating the file directory in the file system where the DEA is located in the node is as follows:

Instance_dir = File.join (@apps_dir, "#{name}-#{instance_index}-#{instance_id}")

By means of variables within the process_dea_start method, creating an internal memory instance, the simplified code is as follows:

Instance = {
Droplet_id: = droplet_id,
Instance_id: = instance_id,
Mem_quota: = MEM * (1024*1024),
Disk_quota: = disk * (1024*1024),
Fds_quota: = num_fds,
: State: STARTING = >,
Cc_partition: = cc_partition

Subsequently, the above examples will be created, stored in the DEA maintenance of Hash type memory @droplets, the code is as follows:

Instances = @droplets[droplet_id] || {}
Instances[instance_id] = instance
@droplets[droplet_id] = instances

Then, in the process_dea_start method, the following definition of a lambda function start_operation.

In the start_operation method, DEA is the first to do the following:

  • For the application to allocate a common port, the supply of the final user access
  • If you need to configure the debug port, DEA for the application distribution debug port
  • If you need to configure the console port, DEA for the application distribution console port

After the port assignment is done, the DEA starts to make the necessary environment settings.

In order, the first to create a manifest variable, stored in the droplet.yaml file status value, through the state value, to detect the application is ready to wait for the first preparation.

Then, create a variable prepare_script, and copy the contents of a path from the source code into the prepare_script, to use it for the execution of the startup command.

Then, through the DEA operating platform of the system type, select the different shell commands.

Then the application directory of some permissions settings, such as: the use of the chown command, to the application for the user to modify the; use the chgrp command, modify the user group; using the Chmod command, the application of read and write permissions to modify.

Then, through the implementation of setup_instance_env method, the application of the environment variable settings. In the implementation process, the method will all the environment variables are added to an array, and finally the contents of all the contents of the array export to the application of the environment variables.

Immediately after, is a very important part, DEA is defined a proc block of code, to achieve the limitation of resources in DEA to start to start the application, need to pay attention to is, this is only the definition of a block of code and not in the process DEA start method, in this block of code location is called, the code is as follows:

Exec_operation = do |process| proc
Process.send_data ("#{instance_dir}\n CD")
If @secure @enforce_ulimit ||
Process.send_data ("renice 0 $$\n")
Process.send_data ("ulimit -m #{mem_kbytes} 2> /dev/null\n ulimit -m takes KB #), soft enforce
Process.send_data ("ulimit -v 3000000 2> /dev/null\n virtual memory at 3G #), this will be enforced
Process.send_data ("-n #{num_fds} 2> /dev/null\n ulimit")
Process.send_data ("ulimit -u 512 2> /dev/null\n processes/threads) #
Process.send_data ("ulimit -f #{disk_limit} 2> /dev/null\n File size to complete disk) # usage
Process.send_data ("077\n umask")
App_env.each {process.send_data |env| ("#{env}\n export")}
Instance[if: port]
Process.send_data ("-p #{instance[./startup: port]}\n")
Process.send_data ("./startup\n")
Process.send_data ("exit\n")

After the execution of the block, the first to enter the corresponding directory application, re set the priority of the program is running, to start the process of the need of some resources settings, set is used in the process of Linux operating system using the ulimit command, the command can do: limit of shell startup process is occupied by the number of resources and the specific types of resources,: physical memory and virtual memory, the need to open the file number, you need to create a number of threads, and disk capacity. It should be noted that such restrictions are temporary constraints and that such restrictions are invalid once the shell session is over. This can also be understood, otherwise the application of DEA for the isolation and control of resources is not difficult to achieve. Resource constraints are done, then the implementation of the env environment variable import, using the export command. The last second implementation of the implementation of the startup script, so as to realize the application of the real start. Finally, it is required to close the shell session, using the exit command.

The code block is executed as follows:

Exit_operation = proc do |_, status| ("completed running with status #{name} = #{status}.") ("uptime was #{ instance[- #{name}: start]}.")
Stop_droplet (instance)

This part of the code is relatively simple, just call the stop_droplet method.

Then there will be a very important module, responsible for the realization of the application of the real start:

Bundler.with_clean_env {EM.system ("#{prepare_script} - true #{@dea_ruby} #{sh_command}, exec_operation,, exit_operation)}

The meaning of the code is: the use of a new environment variable env to carry out the contents of the brackets, the content of EM.system is the final implementation of the ruby startup script.

Then there will be some detection application preparation or PID information operation, for some of the necessary auxiliary operation.

In the end, DEA defines a process Fiber, which is first implemented by stage_app_dir to prepare all the content of the droplet. Once successful, call the code block start_operation.

Finally, using the f.resume method, the wake of the fiber.

DEA monitor application resources

DEA to complete the monitoring of the application of the node is mainly through periodic monitoring to complete, the code is as follows:

EM.add_timer (MONITOR_INTERVAL) {monitor_apps}
The default monitoring interval is 2s. All implementations are implemented in the method monitor_apps, simplifying the code as follows:

Monitor_apps def (false = startup_check)
Process_statuses = axo pid=, ppid=, pcpu=, rss=, `ps, user=`.split ("\n")
Process_statuses.each do |process_status|
Parts = process_status.lstrip.split (/\s+/)
PID = parts[PID_INDEX].to_i
Pid_info[pid] = parts
(user_info[parts[USER_INDEX]] ||= []) "parts if (@secure & & parts[USER_INDEX] = ~ SECURE_USER)
If startup_check
Du_all_out = #{@apps_dir} Du; -sk 2> * /dev/null` `cd
Monitor_apps_helper (startup_check, start, du_start, du_all_out, pid_info, user_info,)
Du_proc = do |p| proc
P.send_data ("#{@apps_dir}\n CD")
P.send_data ("-sk Du * /dev/null\n 2>")
P.send_data ("exit\n")
Cont_proc = do |output proc, status|
Monitor_apps_helper (startup_check, start, du_start, output, pid_info, user_info,)
EM.system ('/bin/sh', du_proc, cont_proc)
Among them, the first look at the code: process = Process_statuses `ps axo pid=, ppid=, pcpu=, rss=, user=`.split ("\ n"), its realization is: through the operating system layer, all of the process information stored in an array of process status in, and then in the array according to the PID information classification and stored in the hash PID type info, and the classification of user information, stored in a hash type user info, used to prepare for subsequent resource monitoring.

And then through the startup_check parameters, to carry out a different disk detection module, the code is:

Du_entries = du_all_out.split ("\n")
Du_entries.each do |du_entry|
Size, dir = du_entry.split ("\t")
Size = size.to_i * Convert to bytes 1024 #
Du_hash[dir] = size
Enddu_all_out = `cd #{@apps_dir}; Du -sk * 2> /dev/null`
The command into the root directory of the application, the Du command execution, disk space occupied by the directory of applications in statistics, followed by the use of monitor app helper method to each other to collect data, get the ultimate monitoring data. The other part of the judgment statement is the same function as the proc code.

The following analysis of the monitor_app_helper method. The following is the implementation of the disk space occupied by the directory:

Du_entries = du_all_out.split ("\n")
Do |du_entry| du_entries.each
Dir, size = du_entry.split ("\t")
Size = size.to_i * Convert to bytes 1024 #
Du_hash[dir] = size

Do_all_out shows the disk usage of all the applications in @droplets, through the split method or the information of each application, and then stored in the Hash type du_hash.

The following is the summary of the most important information:

Do |instances| @droplets.each_value
Do |instance| instances.each_value
If instance[: pid] and pid_info[instance[: pid]]
PID = instance[: pid]
MEM = CPU = 0
Disk = du_hash[File.basename (instance[: dir]) ||] 0
# For secure mode, gather all stats for secure_user so we can process forks, etc.
If @secure and user_info[instance[secure_user]]
User_info[instance[: do |part| secure_user]].each
MEM = part[MEM_INDEX].to_f
CPU = part[CPU_INDEX].to_f
# disabled for now, LSOF is too slow to run per app/user
# deleted_disk = grab_deleted_file_usage (instance[: secure_user])
# disk = deleted_disk
MEM = pid_info[pid][MEM_INDEX].to_f
CPU = pid_info[pid][CPU_INDEX].to_f
Usage = @usage[pid] ||= []
Cur_usage: time = > = { CPU = > CPU,: MEM = > MEM,: disk = > disk}
Cur_usage "usage"
If usage.length usage.shift > MAX_USAGE_SAMPLES
Check_usage (instance, cur_usage, usage) @secure if

#@logger.debug ("Stats are Droplet = #{JSON.pretty_generate (usage)}")
@mem_usage = mem

Do |key metrics.each, value|
Metric = value[instance[key]] ||= {: used_memory = > 0,: reserved_memory = > 0,
: used_disk = > 0,: used_cpu = > 0}
Metric[: used_memory] = mem
Metric[: reserved_memory]: mem_quota] = instance[/ 1024
Metric[: used_disk] = disk
Metric[: used_cpu] = CPU

Track running apps for varz tracking #
I2 = instance.dup
I2[: usage] = cur_usage Snapshot #

I2 "running_apps"

With router on startup # Re-register since these are orphaned and may have been dropped.
Register_instance_with_router (instance) startup_check if
Traverse @droplets, and then traverse the @droplets in each object of the instance, the first instance of the PID, and the initialization of MEM, CPU, disk. Subsequently, to find the secure_user: instance, through the user_info to find all the instance[: secure_user] value, and then, it will be accumulated into the MEM and cpu.

Will get the resource usage of the specified instance, the information needs to be stored in the DEA maintenance of memory @usage, the code is as follows:

Usage = @usage[pid] ||= []
Cur_usage: time = > = { CPU = > CPU,: MEM = > MEM,: disk = > disk}
Cur_usage "usage"

First, need to pay attention to the operator ||=, code meaning if @usage[pid] is not empty words, will assign a value to a [] @usage[pid], and ultimately the assignment to usage. The two lines of code to initialize a hash variables, in the third line of the code said cur usage are added to an array of usage, and pointing to the array of usage is @usage[pid] address so it is equivalent to adding a cur usage in @usage[pid].

The metrics variable is a record of the application of the entire DEA all running applications to add up to the application of resources, so that after the update varz information.

Above is the application of Foundry DEA Cloud in the application of resource monitoring.


From the above analysis shows, cloud foundry V1 versions of DEA in resource monitoring, through the underlying Linux PS, and for control of resources, and no do very good, used to start the limitation and excess stop strategy. This method is obviously not the best, and Foundry Cloud also introduced the warden container, to achieve the control and isolation of the application of resources in DEA. If the author can understand or read this head, then look forward to give everyone to send warden to achieve, as well as the mechanism of cgroup.

About the author:

Sun Hongliang,DAOCLOUDSoftware engineer. In the past two years, the main research areas of PaaS related knowledge and technology in the field of cloud computing. Firmly believe that the technology of lightweight virtual container, will bring the depth of the impact of the PaaS field, and even determine the future direction of PaaS technology.

Reproduced please indicate the source.

This document is more out of my own understanding, certainly in some places there are shortcomings and errors. I hope this article will be able to contact Foundry DEA in the application of Cloud in the operation and resource monitoring people some help, if you are interested in this area, and have better ideas and suggestions, please contact me.

My email address:
Sina weibo:Fu Ruqing lotus seed

step on
Guess you're looking for
View comments
* the above user comments only represent their personal views, does not represent the views or position of the CSDN website
    personal data
    • visit85731 times
    • Integral:One thousand three hundred and sixty-nine
    • Grade
    • Rank:18361st name
    • original47
    • Reproduced:0
    • Translation:1
    • Comments:51
    Blog column
    contact information
    Latest comments