“嫦娥五号”月面采样机械臂路径规划

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

姓名

邮箱

手机号码

标题

留言内容

验证码

“嫦娥五号”月面采样机械臂路径规划

北京航天飞行控制中心，北京 100094

基金项目:国家自然科学基金青年科学基金资助项目（62003025）；国家自然科学基金资助项目（61972020）

详细信息

作者简介:
胡晓东（1993– ），男，工程师，主要研究方向：航天器测控总体设计。通讯地址：北京市海淀区北清路26号院5130信箱（100094）电话：（010）66363119E-mail：huxiaodong2037@163.com
谢剑锋（1972– ），男，研究员，主要研究方向：航天测控总体、轨道控制。本文通讯作者。通讯地址：北京5130信箱104号（100094）电话：（010）66363008E-mail：jianfengxie@126.com

●　A path planning method of lunar surface sampling manipulator based on deep reinforcement learning is proposed.

●　The control problem of slender and flexible manipulator is solved.

●　The deep reinforcement learning control method has high accuracy and robustness.

●　The method improves the efficiency of on orbit mission implementation

中图分类号:V434

摘要:针对“嫦娥五号”月面采样任务中采样机械臂的精准控制问题，提出了一种基于深度强化学习的路径规划方法。通过设计深度强化学习算法的多约束奖赏函数，规划了满足安全性、快速性、可达性3个约束的运动路径，实现了采样机械臂的精准控制。在满足任务安全性的提前下，缩短了天地之间的交互时间，机械臂控制效果平稳。在轨实验结果表明，该方法具有较高的准确性和鲁棒性，可为后续的深空探测在轨遥操作采样任务提供借鉴。

关键词:

Abstract:Aiming at the problem of precise control of the sampling manipulator in the lunar surface sampling mission of "Chang'E-5", a path planning method based on deep reinforcement learning is proposed. By designing the multi-constraint reward function of the deep reinforcement learning algorithm, a motion path that satisfies the three constraints of safety, speed and reachability is planned. The precise control of the sampling robotic arm is realized. Under the advance of meeting the task safety, the interaction time between heaven and earth is greatly shortened, and the control effect of the manipulator is more stable. Experimental results show that this method has high accuracy and robustness, and can provide reference for subsequent on orbit sampling tasks.

Key words:

Highlights

●　A path planning method of lunar surface sampling manipulator based on deep reinforcement learning is proposed.

●　The control problem of slender and flexible manipulator is solved.

●　The deep reinforcement learning control method has high accuracy and robustness.

●　The method improves the efficiency of on orbit mission implementation

i	θ/（°）	α_i-1/（°）	a_i-1/mm	d/mm
1	θ₁	90	0	101.0
2	θ₂	0	0	85.5
3	θ₃	0	1 970	96.0
4	θ₄	0	1 770	93.0

θ/（°）

α_i-1/（°）

a_i-1/mm

d/mm

θ₁

101.0

θ₂

85.5

θ₃

1 970

96.0

θ₄

1 770

93.0

“嫦娥五号”月面采样机械臂路径规划

北京航天飞行控制中心，北京 100094

基金项目:国家自然科学基金青年科学基金资助项目（62003025）；国家自然科学基金资助项目（61972020）

作者简介:
胡晓东（1993– ），男，工程师，主要研究方向：航天器测控总体设计。通讯地址：北京市海淀区北清路26号院5130信箱（100094）电话：（010）66363119E-mail：huxiaodong2037@163.com
谢剑锋（1972– ），男，研究员，主要研究方向：航天测控总体、轨道控制。本文通讯作者。通讯地址：北京5130信箱104号（100094）电话：（010）66363008E-mail：jianfengxie@126.com

●　A path planning method of lunar surface sampling manipulator based on deep reinforcement learning is proposed.

●　The control problem of slender and flexible manipulator is solved.

●　The deep reinforcement learning control method has high accuracy and robustness.

●　The method improves the efficiency of on orbit mission implementation

收稿日期:2021-09-30

修回日期:2021-11-12

网络出版日期:2021-07-20

刊出日期:2021-12-31

中图分类号:V434

关键词:

注释:

●　A path planning method of lunar surface sampling manipulator based on deep reinforcement learning is proposed.

●　The control problem of slender and flexible manipulator is solved.

●　The deep reinforcement learning control method has high accuracy and robustness.

●　The method improves the efficiency of on orbit mission implementation

全文HTML

引　言

2021年12月1日23时11分，“嫦娥五号”着陆器和上升器组合体安全着陆于月面正面风暴洋东北部西经51.8°、北纬43.1°位置，月面初始化后约19 h完成了月面采样封装任务，通过钻取和表层采样两种方式共获取1 731 g月球样品^[1]。

空间运动机械臂采取“大范围+精调”运动相结合的控制方式。大范围运动过程需综合考虑安全性、可达性、平稳性，从而实现空间机械臂的远距离移动；精调运动过程需要地面上注规划运动策略，机械臂微调到达精确的目标位置。目前，精调控制采用“视觉定位+专家决策”的方式实现，视觉定位计算出机械臂的当前位置以及与目标位置的偏差，专家结合以往控制经验对机械臂的运动路径进行现场决策，该方式主要面临的难题：①受细长机械臂柔性形变和关节间隙误差影响，机械臂会产生较大的控制偏差，为满足月面采样控制过程的毫米级精度要求^[2-3]，需多次精调到达目标位置；②天地协同工作频繁、过程复杂，机械臂大范围运动到位后，地面首先进行视觉定位，专家基于定位结果给出精调方向和移动距离，难以适应未来“大时延”的深空采样任务；③因不确定的精调量，采样任务需提前准备机械臂运动参考坐标系下各轴向不同步长的运动指令，任务中计算出调整控制量后采用精调指令组合方式实施控制，但运动路径并不是最优的路径。

人工智能研究的对象侧重于相对复杂的控制环境，以及控制模型不确定性等情况^[4-6]。强化学习作为人工智能中机器学习的重要组成部分，机器人通过传感器完成与环境之间的交互，以最大化奖励作为优化目标，通过策略函数获取最优策略，既可以解决合作目标的路径规划问题，又可以实现非合作目标的路径规划。

“嫦娥五号”月表取样采样过程中，有触月、采样、放样、抓罐和放罐5个步骤，使用四自由度细长柔性机械臂采集月面土壤样本并将其转移至初级密封罐中。机械臂的大范围运动至精调初始位置点后，采用精调方式控制机械臂到达目标位置。因细长机械臂的柔性形变特性和机械臂关节间存在间隙误差，使得机械臂末端实际抵达位置与预期位置存在一定的偏差^[2]。判断机械臂末端采样器到达预定采样位置，是实现机械臂精准控制并保证能够采到样品的关键。

针对机械臂的细长、柔性特点使得难以通过开环控制实现精确采样的问题，本文提出了一种基于深度强化学习的月面采样机械臂路径规划方法，针对采样机械臂任务环节多、约束条件复杂、工作环境恶劣等难题，构建了基于深度强化学习——深度Q网络（Deep Q Network，DQN）的路径规划方法，最后结合仿真实验和“嫦娥五号”月面在轨放样精调过程对方法的有效性进行了验证说明。

i	θ/（°）	α_i-1/（°）	a_i-1/mm	d/mm
1	θ₁	90	0	101.0
2	θ₂	0	0	85.5
3	θ₃	0	1 970	96.0
4	θ₄	0	1 770	93.0

4. 结　论

为解决月面无人采样任务中机械臂精准控制难度大、采样时间受限、天地协同复杂的难题，本文提出了基于深度强化学习的机械臂路径规划方法，机械臂根据任务实施过程中的安全性、快速性、可达性准则，自主规划移动路径，并判断控制目标是否满足任务指标要求。通过将该算法规划路径与实际在轨数据路径做比较，结果表明本算法在满足任务安全性的前提下，机械臂控制效果更加平稳，解决了柔性细长采样机械臂难以精准建模导致控制存在系统偏差的难题，实现了采样过程机械臂的精准控制，可以为后续月球及行星探测机械臂远程遥操作采样任务提供借鉴。

参考文献 (22)

[1]	王琼,侯军,刘然,等. 我国首次月面采样返回任务综述[J]. 中国航天,2021(3):34-39.doi:10.3969/j.issn.1002-7742.2021.03.007
[2]	马如奇,姜清水,刘宾,等. 月球采样机械臂系统设计及试验验证[J]. 宇航学报,2018,39(12):5-12. MA R Q,JIANG Q S,LIU B,et al. Design and verification of a lunar sampling manipulator system[J]. Journal of Astronautics,2018,39(12):5-12.
[3]	唐玲,梁常春,王耀兵,等. 基于柔性补偿的行星表面采样机械臂控制策略研究[J]. 机械工程学报,2017,53(11):97-103.doi:10.3901/JME.2017.11.097 TANG L,LIANG C C,WANG Y B,et al. Research on flexible compensation control strategy for planetary surface sampling manipulator[J]. Journal of Mechanical Engineering,2017,53(11):97-103.doi:10.3901/JME.2017.11.097
[4]	NAKANISHI H, YOSHIDA K. Impedance control for free-flying space robots -basic equations and applications[C]//International Conference on Intelligent Robots and Systems. [S. l]: IEEE, 2006.
[5]	SCHIELE A, HIRZINGER G. A new generation of ergonomic exoskeletons-the high-performance X-Arm-2 for space robotics telepresence[C]//International Conference on Intelligent Robots and Systems. [S. l]: IEEE, 2011.
[6]	NANOS K,PAPADOPOULOS E. On the use of free-floating space robots in the presence of angular momentum[J]. Intelligent Service Robotics,2011,4(1):3-15.doi:10.1007/s11370-010-0083-2
[7]	SUTTON R S, BARTO A G. Introduction to reinforcement learning[M]. Cambridge: MIT press, 1998.
[8]	MAEDA Y, WATANABE T, MORIYAMA Y. View-based programming with reinforcement learning for robotic manipulation[C]//IEEE International Symposium on Assembly and Manufacturing. [S. l]: IEEE, 2011.
[9]	PARK J J,KIM J H,SONG J B. Path planning for a robot manipulator based on probabilistic roadmap and reinforcement learning[J]. International Journal of Control Automation & Systems,2007,5(6):674-680.
[10]	LANGE S, RIEDMILLER M, VOIGTLANDER A. Autonomous reinforcement learning on raw visual input data in a real world application[C]//International Joint Conference on Neural Networks. [S. l]: IEEE, 2012.
[11]	LECUN Y,BENGIO Y,HINTON G. Deep learning[J]. Nature,2015,521(7553):436.doi:10.1038/nature14539
[12]	KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. [S. l]: Curran Associates Incorperation, 2012.
[13]	REN S,HE K,GIRSHICK R,et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence,2015,39(6):1137-1149.
[14]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J/OL]. （2021-10-9）.https://arxiv.org/abs/1312.5602.
[15]	OSTAFEW C J, SCHOELLIG A P, BARFOOT T D. Learning-based nonlinear model predictive control to improve vision-based mobile robot path-tracking in challenging outdoor environments[C]//IEEE International Conference on Robotics and Automation. [S. l]: IEEE, 2016.
[16]	LEI T, MING L. A robot exploration strategy based on Q-learning network[C]//IEEE International Conference on Real-Time Computing and Robotics. [S. l]: IEEE, 2016.
[17]	ZHANG F Y, LEITNER J, MILFORD M, et al. Towards vision-based deep reinforcement learning for robotic motion control[C]//proceedings of Australasian Conference on Robotics and Automation（ACRA）. Australasian: IEEE, 2015.
[18]	HASSELT H V, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence Computer Science. Phoenix, Arizona, USA: AIAA, 2016.
[19]	SCGAUL T , QUAN J , ANTONOGLOU I , et al. Prioritized Experience Replay[EB/OL]. （2015-11-18）.https://www.semanticscholar.org/paper/Prioritized-Experience-Replay-Schaul-Quan/c6170fa90d3b2efede5a2e1660cb23e1c824f2ca?p2df.
[20]	WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York , USA: JMLR. , 2015.
[21]	裴照宇,任俊杰,彭兢,等. “嫦娥五号”任务总体方案权衡设计[J]. 深空探测学报(中英文),2021,8(3):215-226. PEI Z Y,REN J J,PENG J,et al. Overall scheme trade-off design of Chang’E-5 mission[J]. Journal of Deep Space Exploration,2021,8(3):215-226.
[22]	GOMES E R, KOWALCZYK R. Dynamic analysis of multiagent Q -learning with ε-greedy exploration[C]//International Conference on Machine Learning. [S. l]: ACM, 2009.

[1]	毛维杨, 王彬, 柳景兴, 熊新.基于强化学习的深空探测器自主任务规划方法. 深空探测学报(中英文）, 2023, 10(2): 220-230.doi:10.15982/j.issn.2096-9287.2023.20220049
[2]	张运动, 刘传凯, 黄开启, 苏建华, 陈钢, 张宽.月面采样机械臂刚度建模与误差补偿. 深空探测学报(中英文）, 2023, 10(2): 169-177.doi:10.15982/j.issn.2096-9287.2023.20220096
[3]	张志恒, 唐钧跃, 张伟伟, 孙凤, 李鹏, 王储, 刘子恒, 贺怀宇, 刘冉冉, 马如奇, 姜生元.月壤微定量采样器设计与试验验证. 深空探测学报(中英文）, 2022, 9(2): 165-172.doi:10.15982/j.issn.2096-9287.2022.20210148
[4]	孙凤, 霍晓文, 茅冒, 赵海宁, 徐方超, 张伟伟, 陈化智, 唐钧跃, 张晓友, 杨旭, 刘雅芳, 姜生元.双体振贯采样磁力驱动式月壤采样器设计. 深空探测学报(中英文）, 2022, 9(2): 157-164.doi:10.15982/j.issn.2096-9287.2022.20210150
[5]	彭德云, 谢剑锋, 赵凤才, 梁爽, 陈翔, 张爱成.月球采样返回飞控任务多目标协同规划设计. 深空探测学报(中英文）, 2022, 9(2): 191-201.doi:10.15982/j.issn.2096-9287.2022.20210082
[6]	肖扬, 李帅, 王光泽, 邵巍, 姚文龙.小天体导航陆标深度学习预测框匹配算法. 深空探测学报(中英文）, 2022, 9(4): 400-406.doi:10.15982/j.issn.2096-9287.2022.20220025
[7]	张宽, 于天一, 胡晓东, 刘传凯, 李立春, 赵焕洲.月面表层无人采样控制技术. 深空探测学报(中英文）, 2022, 9(2): 173-182.doi:10.15982/j.issn.2096-9287.2022.20210052
[8]	王棒, 徐瑞, 李朝玉, 高越.小天体表面探测器弹跳运动与路径规划. 深空探测学报(中英文）, 2022, 9(4): 447-454.doi:10.15982/j.issn.2096-9287.2022.20220042
[9]	马超, 刘飞, 曾婷, 赵曾, 庞勇, 樊明旭, 姜生元.无轴螺旋式模拟月壤主动填充装置研制. 深空探测学报(中英文）, 2019, 6(1): 57-62.doi:10.15982/j.issn.2095-7777.2019.01.009
[10]	于天一, 费江涛, 李立春, 程肖.月面巡视器路径规划方法研究. 深空探测学报(中英文）, 2019, 6(4): 384-390.doi:10.15982/j.issn.2095-7777.2019.04.011
[11]	郭祥艳, 刘传凯, 王晓雪.加拿大移动服务系统地面遥操作模式综述. 深空探测学报(中英文）, 2018, 5(1): 78-84.doi:10.15982/j.issn.2095-7777.2018.01.011
[12]	贺波勇, 曹鹏飞, 罗亚中, 李海阳.环月轨道交会的载人登月任务轨道与窗口规划. 深空探测学报(中英文）, 2017, 4(5): 471-476.doi:10.15982/j.issn.2095-7777.2017.05.008
[13]	李群智, 贾阳, 彭松, 韩璐.月面巡视探测器任务规划顶层设计与实现. 深空探测学报(中英文）, 2017, 4(1): 58-65.doi:10.15982/j.issn.2095-7777.2017.01.009
[14]	赖小明, 白书欣, 赵曾, 庞勇, 殷参.模拟月面环境钻进过程热特性研究. 深空探测学报(中英文）, 2016, 3(2): 162-167.doi:10.15982/j.issn.2095-7777.2016.02.011
[15]	马超, 林晨, 赵曾, 钱成, 殷参, 潘秋月, 姜生元.施必牢螺纹在采样钻具防松连接中的应用研究. 深空探测学报(中英文）, 2016, 3(2): 175-180.doi:10.15982/j.issn.2095-7777.2016.02.013
[16]	魏祥泉, 黄建明, 顾冬晴, 陈凤.火星车自主导航与路径规划技术研究. 深空探测学报(中英文）, 2016, 3(3): 275-281.doi:10.15982/j.issn.2095-7777.2016.03.012
[17]	梁常春, 孙鹏飞, 王耀兵, 危清清, 姜水清.行星采样柔性机械臂运动规划研究. 深空探测学报(中英文）, 2015, 2(1): 27-33.doi:10.15982/j.issn.2095-7777.2015.01.004
[18]	王琼, 于登云, 贾阳.Risk Theta*:一种基于地形危险度的任意航向路径规划算法. 深空探测学报(中英文）, 2014, 1(4): 269-274.doi:10.15982/j.issn.2095-7777.2014.04.004
[19]	董元元, 崔祜涛, 田阳.基于栅格地图的火星车路径规划方法. 深空探测学报(中英文）, 2014, 1(4): 289-293.doi:10.15982/j.issn.2095-7777.2014.04.007
[20]	郑燕红, 邓湘金, 赵志晖, 姚猛, 邹昕.月面回转钻进采样非脆弱鲁棒控制. 深空探测学报(中英文）, 2014, 1(4): 315-319.doi:10.15982/j.issn.2095-7777.2014.04.012

留言板