MasterofProject

Analysis of file directory in Foundry Cloud application example life cycle

Label Foundry CloudDEASource code
1005 people read comment(0) Collection Report
Classification:

In Foundry Cloud, the application is running on the DEA, and the application in its own life cycle, its own file directory will also with different cycle, make different changes.

The created from an application to start an app), stop a stop an app) application, delete a (delete an app) application, restart a (restart an app) application, application crash, the closure of the DEA, start the DEA, DEA abnormal exit after the restart, this several aspects, analysis of changes in the application directory.

The Foundry Cloud described in this article is limited to the V1 version, the V2 version will follow up.

An app start

An app start mainly refers to the application of the user request, so Foundry Cloud to create an application, or to start an application. It should be noted that, in the an app start before, Foundry Cloud every DEA in the app does not have the file. A DEA to accept an app start request, the DEA must from local storage droplet, droplet download and under the DEA of the node where the path to a file decompression modified droplet, started at the end of the droplet after extracting the application startup scripts. In this case, the DEA file system will have a corresponding file directory exists.

The above operation of the code to achieve, in the process_dea_start /dea/lib/dea/agent.rb method:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. Tgz_file =File(.Join@staged_dir,"#{sha1}.tgz")
  2. Instance_dir =File(.Join@apps_dir,"#{name}-#{instance_index}-#{instance_id}")
This part of the code generation applications where DEA compressed package file directory and the concrete implementation of the file directory, and in the subsequent success = stage app dir (bits file, bits URI and SHA1, tgz file, instance dir, the runtime) achieve download application source code to instance dir. After the completion of the start, more than instance_dir, is the application of the file path.

Summary: an app start create a file directory on a DEA and start the application.

An app stop

An app stop mainly refers to the application of the user request, so that Foundry Cloud to stop an application running. It should be noted that, before an app stop, must be in the operation of the application, the application of the file directory as well as the source code already exists in a DEA file system. Controller stop after receiving the user's an app Cloud request, the first application will be found to run the DEA node, and the DEA to send stop requests for the application. When DEA receives the request, the process_dea_stop method is executed, as follows:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. NATS.subscribe ('dea.stop'() {process_dea_stop MSG (|msg|)}
In process_dea_stop, the main implementation is the application of the stop, including all the examples of the application, the code is as follows:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. Return UnlessInstances =@droplets[droplet_id]
  2. Instances.each_valueDo|instance|
  3. Version_matched = version.Nil|| instance[?Version= = version]
  4. Instance_matched = instance_ids.Nil|| (instance[? Instance_ids.include?Instance_id])
  5. Index_matched = indices.Nil|| (instance[? Indices.include?Instance_index])
  6. State_matched = states.Nil|| (instance[? States.include?: state].to_s)
  7.   If(version_matched & & instance_matched & & index_matched & & state_matched)
  8. Instance[Exit_reason=: STOPPEDIfSTARTING,: RUNNING].include? (instance[?: state])
  9.     IfInstance[: state= =]: CRASHED
  10. Instance[: state=: DELETED
  11. Instance[Stop_processed=]False  
  12.     End  
  13. Stop_droplet (instance)
  14.   End  
  15. End  
First now hash this @droplets object to find the location of the application of ID, and then traverse the application of all examples, in the case of the application for state processing, then the implementation of the stop_droplet method. That is to say the real implementation of the operation of the application examples in the stop_droplet method, the following into the method of code implementation:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1.     DefStop_droplet (instance)
  2.       Return If(instance[Stop_processed])
  3. Send_exited_message (instance)
  4. Username = instance[Secure_user]
  5.   
  6.       If system thinks this process # is running, make sure to execute stop script  
  7.       IfInstance[: PID|| []: STARTING (instance[: RUNNING].include?: state])
  8. Instance[: state=: STOPPEDUnlessInstance[: state= =]: CRASHED
  9. Instance[State_timestamp=]Time.now.to_i
  10. Stop_script =File.join (instance[: dir],'stop')
  11. Insecure_stop_cmd ="#{instance[#{stop_script}: 2> /dev/null pid]}"  
  12. Stop_cmd =
  13.           If @secure  
  14.             "-c Su \" "#{insecure_stop_cmd}\" "#{username}""  
  15.           Else  
  16. Insecure_stop_cmd
  17.           End  
  18.         Unless(RUBY_PLATFORM = ~ /darwin/And @secure)
  19. Bundler.with_clean_env {system (stop_cmd)}
  20.         End  
  21.       End  
  22. ..................
  23. Cleanup_droplet (instance)
  24.     End  
Can be seen in the method, mainly through the implementation of the application to stop the script to achieve an app stop request. Among them, stop_script = File.join (instance[: dir],'stop') to find the location of the stop script, insecure_stop_cmd = "#{instance[#{stop_script}: 2> pid]} / dev / null "not generated script command, then through the variables of @secure remeshing stop CMD. Finally the implementation of Bundler.with_clean_env clean env {system (stop CMD)}. For the start of a new environment to make the operating system executes a script stop CMD.

In fact, this is the most concerned about is the next operation cleanup_droplet DEA operation, because the operation is the real application in the DEA file system directory related parts. The following entry into the cleanup_droplet method:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. DefCleanup_droplet (instance)
  2. Remove_instance_resources (instance)
  3.   @usage.delete (instance[: PID])IfInstance[: PID]
  4.   IfInstance[: stateCRASHED: = instance[||]!Flapping]
  5.     IfDroplet =@droplets[instance[Droplet_id].to_s]
  6. Droplet.delete (instance[Instance_id])
  7.       @droplets.delete (instance[Droplet_id].to_s)IfDroplet.empty?
  8. Schedule_snapshot
  9.     End  
  10.     Unless @disable_dir_cleanup  
  11.       @logger.debug ("#{instance[: name]}: Cleaning: dir]}#{instance[: #{instance[up dir flapping]? '(flapping)':''}")
  12. EM.system ("-rf #{instance[RM: dir]}")
  13. EndFileUtils.mv (tmp.path,@app_state_file)
  14.   Else  
  15.     @logger.debug ("#{instance[: name]}: crashed dir #{instance[Chowning: dir]}")
  16. EM.system ("-R #{Process.euid} chown: #{instance[#{Process.egid}: dir]}")
  17.   End  
  18. End  
In the method, after checking the status of the application, if the state of the application is not: or instance[CRASHED: flapping] not is really, in the @droplets the hash object delete to stop the application instance ID, then schedule the snapshot operation and the method of realization in later analyses. And then through the following code to achieve the application example file directory deleted:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. Unless @disable_dir_cleanup  
  2.    @logger.debug ("#{instance[: name]}: Cleaning: dir]}#{instance[: #{instance[up dir flapping]? '(flapping)':''}")
  3. EM.system ("-rf #{instance[RM: dir]}")
  4. End  
Also is in is said @disable_dir_cleanup dir cleanup variables for the truth, not execute script commands RM - RF #{instance[: dir]}. If false, then execute script commands RM - RF #{instance[: dir]}, on the other hand, will delete all application directory. In default, the cloud foundry on the @disable_dir_cleanup dir the cleanup variable initialization, in the agent class of intialize () method, initialization read configuration Config['disable_dir_cleanup'] dir Config['disable_dir_cleanup'], and configure the default is empty, is false.

Now the analysis of the method involved in the schedule_snapshot method, in the stop_droplet method, delete the @droplets in the application to remove the information, then call the schedule_snapshot method. The realization of the method is as follows:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. DefSchedule_snapshot
  2.   Return If @snapshot_scheduled  
  3.   @snapshot_scheduled=True  
  4. EM.next_tick {snapshot_app_state}
  5. End  
Can be seen mainly to achieve the snapshot_app_state method, and now into the method:

[html] View PlainCopy在代码上查看代码片派生到我的代码片
  1. Snapshot_app_state def
  2.   Start=Time.now
  3.   TMP=File.new ("#{@db_dir}/snap_#{Time.now.to_i}",'w')
  4. Tmp.puts (JSON.pretty_generate (@droplets))
  5. Tmp.close
  6. FileUtils.mv (tmp.path, @app_state_file)
  7. @logger.debug ("#{Time.now start} - to snapshot application state. Took")
  8. @Snapshot_scheduled=False  
  9. End
First, this method gets the current time, and TMP = File.new ("#{@db_dir}/snap_#{Time.now.to_i} #{@db_dir}/snap_#{Time.now.to_i} #{@db_dir}/snap_#{Time.now.to_i}}" 'W') to create a file, the @droplets variables JSON of, then JSON information is written to the TMP file. Close the file by command FileUtils.mv (tmp.path, @app_state_file state file will rename the TMP file @app_state_file state of the file and the variables for the @app_state_file state file = File.join (@db_dir, APP_STATE_FILE), where APP_STATE_FILE ='applications.json':.

Summary, when an app stop, DEA's operating procedures are as follows:

  1. Delete all instances of the app in the @droplets information;
  2. Execute stop scripts for all instances of the app;
  3. All records in the @droplets object that are specified to be deleted are written to @app_state_file;
  4. All instances of the app file directory, delete processing.

An app delete

Delete an app is mainly refers to the application user initiated a delete application request, the request is caught by the cloud controller, cloud controller first stop all instances of the application, then the application of the droplet to be deleted. Therefore, in the operation of the request, there are all the information related to the application will be deleted, naturally including the application of the file directory on the DEA.


An app restart

An app restart mainly refers to the application of a user initiated a request to restart the application, the request in the implementation of the VMC is divided into two requests, a stop request, a start request. Therefore, stop request stop the application running on a DEA, and delete the file directory for the application; and start request in a DEA now download the application source code, is to create a directory of files, finally the application start up. Special attention is paid to the execution of the stop request DEA and the execution of the DEA request of the start is not necessarily the same as the DEA. Execution of the DEA request stop for the current need to stop the application of the DEA, and the implementation of the start request DEA, need to be made by Cloud Controller decision making.


Crashes app

Crashes app mainly refers to the application in the running process of the collapse of the request. In other words, the application of the crash, DEA is unknown in advance, and an app stop there is a big difference in the specific cluster can be forced to kill the application process to simulate the application of the crash.

First, because the collapse of the application not through DEA, so DEA not execution stop droplet method and cleanup droplet method and theory of file directory for the application will still exist in the file system of the DEA, according to Xu occupy DEA file system disk space. You can imagine, if applied if things go on like this words, the system disk space wastage is very obvious. And on this topic, Foundry DEA Cloud will take a regular implementation of the operation to remove the crashed application, the application will be the collapse of the file directory deleted.

Specifically, due to the collapse of the application, then about the application of PID also would not exist (Theory), in DEA periodically monitor app method, all the process information is stored, followed by the execution monitor apps helper method, for @ levels of on a per application per a instance, the PID information in actual in DEA node process PID are compared, if it fails, the application examples of @droplets has not in operation, it can be concluded that is not normal out of the implementation. Implementation code is as follows:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. DefMonitor_apps_helper (startup_check, ma_start, du_start, du_all_out, pid_info, user_info,)
  2. ............
  3.   
  4.       @droplets.each_valueDo|instances|
  5. Instances.each_valueDo|instance|
  6.           IfInstance[: PID& & pid_info[instance[]: PID"
  7. ............
  8.           Else  
  9.             *should* no longer be # App running if we are here  
  10. Instance.delete (: PID)
  11.             To see if this # Check is an orphan that is no longer running, clean up here if needed  
  12.             #因为不会有一个清理过程或停止调用的实例有关。  
  13. stop_droplet(实例)如果(例如[:孤儿] & & &!实例:stop_processed
  14.           结束  
  15.         结束  
  16.       结束  
  17. ............
  18.     结束  

当发现该应用实例实际情况下已经不再运行的话,DEA就会执行代码实例。删除(:PID)以及stop_droplet(实例)(例如:如果[孤儿] &!例如:stop_processed [ ]),可以如果(例如:孤儿]和[!例如:stop_processed [ ])为真的话,那就执行stop_droplet方法,在执行stop_droplet方法的时候,由于先执行send_exited_message方法,如下:

[红宝石] 观 平原复制在代码上查看代码片派生到我的代码片
  1. DEFstop_droplet(实例)
  2.       #从云控制器停止,这可以称为两次。只要确保我们折返..  
  3.       返回 如果(例如[:stop_processed
  4.   
  5.       #拔掉我们立即从系统的路由器和健康管理。  
  6. send_exited_message(实例)
  7.   
  8. ......
  9. cleanup_droplet(实例)
  10.     结束  
而send_exited_message方法中的代码实现如下:

[红宝石] 观 平原复制在代码上查看代码片派生到我的代码片
  1. DEFsend_exited_message(实例)
  2.   返回 如果实例:通知]
  3.   
  4. unregister_instance_from_router(实例)
  5.   
  6.   除非实例:exit_reason]
  7. 实例:exit_reason=:坠毁
  8. 实例:国家=:坠毁
  9. 实例:state_timestamp=时间now.to_i。
  10. 删除(除非instance_running?实例
  11.   结束  
  12.   
  13. send_exited_notification(实例)
  14.   
  15. 实例:通知=真正的  
  16. 结束  
首先先在路由器中注销该应用实例的URL,由于对于一个异常终止的应用实例来说,肯定不会有实例[:exit_reason ]值,所以正如正常逻辑,应该将该应用实例的:exit_reason以及:国家设置为:坠毁。

stop_droplet方法中执行完send_exit_message方法之后,最后会执行cleanup_droplet方法。进入cleanup_droplet方法中,由于该应用实例的:国家已经被设定为:坠毁,所以该应用实例不会进入删除文件没有的命令中,而是执行chown命令,代码如下:

[红宝石] 观 平原复制在代码上查看代码片派生到我的代码片
  1. DEFcleanup_droplet(实例)
  2. ......
  3.   如果实例:国家]!=:撞| |实例[:拍打]
  4. ......
  5.   其他的  
  6.     @记录器调试(“# {实例[名字]:}:卓宁坠毁DIR # {实例[目录]:}”
  7. em.system(“chown -R # {过程。euid }:# {过程。该# { } [目录]实例:}”
  8.   结束  
  9. 结束  
到目前为止,撞应用的状态只是被标记为:坠毁,而其文件目录还是存在于DEA的文件系统中,并没有删除。

但是可以想象的是,对于一个崩溃的应用实例,没有将其删除的情况是不合理的,当时云的设计者肯定会考虑这一点。实际情况中,DEA的执行时,会添加一个周期性任务crashes_reaper,实现代码如下:

[红宝石] 观 平原复制在代码上查看代码片派生到我的代码片
  1. em.add_periodic_timer(crashes_reaper_interval){ crashes_reaper }
而crashes_reaper_internal的数值设定为3600,也就是每隔一小时都是执行一次crashes_reaper操作,现在进入crashes_reaper方法的代码实现:

[红宝石] 观 平原复制在代码上查看代码片派生到我的代码片
  1. DEFcrashes_reaper
  2.   @液滴each_value。|实例|
  3.     #删除所有崩溃的情况下,一个多小时以上  
  4. instances.delete_if| _,实例|
  5. delete_instance =实例[:国家= =:坠毁& &时间[ now.to_i实例。:state_timestampcrashes_reaper_timeout ] >
  6.       如果delete_instance
  7.         @记录器调试(“崩溃的收割者删除:# {实例[:instance_id ] }”
  8. em.system(“RM -射频# {实例[目录]:}”除非 “disable_dir_cleanup  
  9.       结束  
  10. delete_instance
  11.     结束  
  12.   结束  
  13.   
  14.   @液滴delete_if。| _,滴|
  15. droplet.empty?
  16.   结束  
  17. 结束  
该代码的实现很简单,也就是如果一个应用实例的状态为:坠毁,那就删除该应用实例的文件目录。

总结,当一个应用实例崩溃的时候,应用实例将不能被访问,所在节点的文件系统中而且其文件目录依然会存在与DEA,DEA会将应用实例的状态标记为:坠毁,随后通过周期为1小时的任务crashes_reaper将其文件目录删除。

停止DEA

DEA stop mainly refers to the Foundry Cloud developer users through the Foundry Cloud specified in the script command, to stop the operation of the DEA component. When the developer user initiates the request, the DEA component will catch the request:
[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. ['TERM','INT','QUIT'.Each{trap |s| (s) {}} {shutdown}}
When caught to this request, DEA will execute the shutdown method, and now enter the code implementation of the method:
[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. DefShutdown ()
  2.   @shutting_down=True  
  3.   @logger(.InfoDown..''Shutting)
  4.   @droplets.each_pairDo|id, instances|
  5.     @logger.debug ("App #{id} Stopping")
  6. Instances.each_valueDo|instance|
  7.       Skip any crashed instances #  
  8. Instance[Exit_reason=: DEA_SHUTDOWNUnlessInstance[: state= =]: CRASHED
  9. Stop_droplet (instance)
  10.     End  
  11.   End  
  12.   
  13.   # Allows messages to get out.  
  14. EM.add_timer (0.25)Do  
  15. Snapshot_app_state
  16.     @file_viewer_server.stop!
  17. NATS.stop {EM.stop}
  18.     @logger(.Info'Bye..')
  19.     @pid_file.unlink ()
  20.   End  
  21. End  
See the code above knowable, the implementation of the shutdown method and example for @droplets on a per application per a non CRASHED state, will: exit reason set for: DEA shutdown, then performs the stop droplet method and cleanup droplet method, that will delete all instances of the application directory. After deletion, DEA will choose to end the process. Of course on the process information of the application.json file, it will delete the normal operation of the application examples of information.

Summary: DEA a stop, will be the first to stop all the normal application of the run, and then these are the application examples of the file directory will be deleted.

DEA start

DEA start mainly refers to the Foundry Cloud developer user specified by the Foundry Cloud script commands to start the operation of the DEA component. When a developer initiated the request, the DEA component started, an important part of the agent object was created and run, and now enter the agent instance object running code, mainly concerned with the application instance file directory section:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. Recover existing application state. #  
  2. Recover_existing_droplets
  3. Delete_untracked_instance_dirs

Can be seen is the first to carry out the recover_existing_droplets method, the code is as follows:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. DefRecover_existing_droplets
  2. ............
  3.   File(.Open@app_state_file,'r'{recovered |f| = Yajl:: Parser.parse (f)}
  4.   Through and reconstruct droplet_ids # Whip and instance symbols correctly for droplets, state, etc..  
  5. Recovered.each_pairDo|app_id, instances|
  6.     @droplets[app_id.to_s] = instances
  7. Instances.each_pairDo|instance_id, instance|
  8. ............
  9.     End  
  10.   End  
  11.   @recovered_droplets=True  
  12.   Ahead and do a # Go monitoring pass here to detect app state  
  13. Monitor_apps (True)
  14. Send_heartbeat
  15. Schedule_snapshot
  16. End  

The method is mainly based on the information in the @app_state_file file, the reduction of @droplets information, and then the implementation of monitor_apps, send_heartbeat and schedule_snapshot methods.

Then the delete_untracked_instance_dirs method will be implemented, mainly to remove the @droplets does not match the application examples file directory.

If the DEA is normal before the summary, exit, and normal exit before have cleared all the crashed application, the aplication_json file has not any information, and store the file directory under the application path will not have any application, so this method does not file directory delete; if the DEA is normal before the exit, and the application of crashed the example is not deleted, starting when the application will still exist, waiting for the crashes_reaper operation to remove it; if DEA crashes out, path application directory with DEA before the collapse of the inconsistencies, and application.json is not consistent with the practical examples of applications, application will not match the file directory to delete.

Achieve as follows:

[ruby] View PlainCopy在代码上查看代码片派生到我的代码片
  1. Any instance dirs without # Removes a corresponding instance entry in @droplets  
  2. NB: This is run once # at startup, so not using EM.system to perform the RM is fine.  
  3. DefDelete_untracked_instance_dirs
  4. Tracked_instance_dirs = Set.New  
  5.   ForDroplet_id, instancesIn @droplets  
  6.     ForInstance_id, instanceInInstances
  7. tracked_instance_dirs <<实例[:目录]
  8.     结束  
  9.   结束  
  10.   
  11. all_instance_dirs =集。新的目录glob(。文件加入(“apps_dir' * '))
  12. to_remove = all_instance_dirs - tracked_instance_dirs
  13.   对于目录to_remove
  14.     @记录器警告(“删除实例目录# { dir }”,不符合任何实例入门。”
  15. rm_rf FileUtils(DIR)。
  16.   结束  
  17. 结束  

DEA的崩溃

DEA崩溃主要是指,DEA在运行过程崩溃,非正常终止,可以是用强制结束DEA进程来模拟DEA崩溃。

由于DEA进程退出后,并不会直接影响到应用实例的运行,所以应用的文件目录还是会存在的,应用还是可以访问。当重新正常启动DEA进程的时候,由于和开始DEA操作完全一致。需要注意的是,假如重启的时候,之前运行的应用都正常运行的话,那么通过recover_existing_droplets方法可以做到监控所有应用实例,通过monitor_apps方法。随后又可以通过send_heartbeat以及schedule_snapshot方法,实现与外部组件的通信。假如DEA重启的时候,之前运行的应用实例有部分已经崩溃掉了,那在monitor_apps方法的后续执行中会将其文件目录删除。


以上便是我对云中应用实例生命周期中文件目录的变化分析。



关于作者:

孙宏亮,daocloud软件工程师。两年来在云计算方面主要研究PaaS领域的相关知识与技术。坚信轻量级虚拟化容器的技术,会给PaaS领域带来深度影响,甚至决定未来PaaS技术的走向。

转载请注明出处。

这篇文档更多出于我本人的理解,肯定在一些地方存在不足和错误。希望本文能够对接触云中应用实例生命周期中文件目录变化的人有些帮助,如果你对这方面感兴趣,并有更好的想法和建议,也请联系我。

我的邮箱:allen.sun@daocloud.io
新浪微博:“莲子弗如清
猜你在找
查看评论
*以上用户言论只代表其个人观点,不代表CSDN网站的观点或立场
    个人资料
    • 访问:84264次
    • 积分:一千三百五十四
    • 等级:
    • 排名:18446名第
    • 原创:47篇
    • 转载:0篇
    • 译文:1篇
    • 评论:50条
    博客专栏
    联系方式
    文章分类
    最新评论