Welcome to Journal of Beijing Institute of Technology
Volume 26Issue 4
.
Turn off MathJax
Article Contents
Yi Xie, Zhongyi Liu, Zhao Liu, Yijun Gu. Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP[J]. JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2017, 26(4): 525-533. doi: 10.15918/j.jbit1004-0579.201726.0412
Citation: Yi Xie, Zhongyi Liu, Zhao Liu, Yijun Gu. Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP[J].JOURNAL OF BEIJING INSTITUTE OF TECHNOLOGY, 2017, 26(4): 525-533.doi:10.15918/j.jbit1004-0579.201726.0412

Cooperative Multi-Agent Reinforcement Learning with Constraint-Reduced DCOP

doi:10.15918/j.jbit1004-0579.201726.0412
  • Received Date:2016-08-26
  • Cooperative multi-agent reinforcement learning (MARL) is an important topic in the field of artificial intelligence, in which distributed constraint optimization (DCOP) algorithms have been widely used to coordinate the actions of multiple agents. However, dense communication among agents affects the practicability of DCOP algorithms. In this paper, we propose a novel DCOP algorithm dealing with the previous DCOP algorithms' communication problem by reducing constraints. The contributions of this paper are primarily threefold:① It is proved that removing constraints can effectively reduce the communication burden of DCOP algorithms. ② An criterion is provided to identify insignificant constraints whose elimination doesn't have a great impact on the performance of the whole system. ③ A constraint-reduced DCOP algorithm is proposed by adopting a variant of spectral clustering algorithm to detect and eliminate the insignificant constraints. Our algorithm reduces the communication burdern of the benchmark DCOP algorithm while keeping its overall performance unaffected. The performance of constraint-reduced DCOP algorithm is evaluated on four configurations of cooperative sensor networks. The effectiveness of communication reduction is also verified by comparisons between the constraint-reduced DCOP and the benchmark DCOP.
  • loading
  • [1]
    Guestrin C, Lagoudakis M, Parr R. Coordinated reinforcement learning[C]//Proc of International Conference on Machine Learning, 2002.
    [2]
    Watkings C J C H. Learning from delayed rewards[D]. Cambridge:Cambridge University, 1989.
    [3]
    Zhang C, Lesser V. Coordinating multiagent reinforcement learning with limited communication[C]//Proc of International Conference on Autonomous Agents and Multiagent Systems, 2013.
    [4]
    Zhang C, Abdallah S, Lesser V. Integrating organizational control into multiagent learning[C]//Proc of International Conference on Autonomous Agents and Multiagent Systems, 2009.
    [5]
    Kok J R, Vlassis N. Collaborative multiagent reinforcement learning by payoff propagation[J]. Journal of Machine Learning Research, 2006, 7:1789-1828.
    [6]
    Shi J, Malik J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.
    [7]
    Zafarani R, Abbasi M A, Liu H. Social media mining[M]. Cambridge:Cambridge University Press, 2014.
    [8]
    Chechetka A, Sycara K. No-commitment branch and bound search for distributed constraint optimization[C]//Proc of International Conference on Autonomous Agents and Multiagent Systems, 2006.
    [9]
    Petcu A, Faltings B. DPOP:A scalable method for multiagent constraint optimization[C]//Proc of International Joint Conference on Artificial Intelligence, 2005.
    [10]
    Mailler R, Lesser V. Solving distributed constraint optimization problems using cooperative mediation[C]//Proc of International Conference on Autonomous Agents and Multiagent Systems, 2004.
    [11]
    Lesser V, Ortiz C L, Tambe M. Distributed sensor networks:a multiagent perspective[M]. Dordrecht:Kluwer Academic Publisher, 2003.
    [12]
    Zha H, He X, Ding C, et al. Spectral relaxation for k-means clustering[C]//Proc of Neural Information Processing Systems, 2001.
    [13]
    Dechter R. Bucket elimination:a unifying framework for reasoning[J]. Artificial Intelligence, 1999, 114:41-85.
    [14]
    Kumar A, Zilberstein S. Scalable multiagent planning using probabilistic inference[C]//Proc of International Joint Conference on Artificial Intelligence, 2011.
    [15]
    Sutton R S, Barto A G. Reinforcement learning:an introduction[M]. Boston:Massachusetts Institute of Technology Press, 1998.
  • 加载中

Catalog

    通讯作者:陈斌, bchen63@163.com
    • 1.

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views (968) PDF downloads(534) Cited by()
    Proportional views
    Related

    /

      Return
      Return
        Baidu
        map