Collaborative Computing in Multi UAV MEC Network Optimization

Multi-UAV collaborative path planing.

为什么不能用增强学习的方法？因为以下三点风险：
GIoT的设备会动态地加入或者离开网络。
恶劣的用户会上传低质量甚至有害的数据。
甚至会发动逆模型攻击：即通过训练好的模型来提取训练数据。

MADIFF: Offline Multi-agent Learning with Diffusion Models

MADIFF is the ffrst diffusion-based multi-agent learning framework, which behaves as both a decentralized policy and a centralized controller.

下面这句话说明了为什么diffusion model用于多智能体非监督式学习的创新点。
Despite its effectiveness in single-agent learning, applying the generative framework to multi-agent(MA) decision tasks remains uncertain. This is due to the need for modeling interactions and coordination among agents, while each agent makes their decisions in a decentralized manner.

作者将多智能体学习multi-agent learning (MAL)问题进行如下阐述：
In particular, we formulate MAL as conditional generative modeling
problems, where the target is to generate each agent’s trajectory, conditioned on the information of all agents (centralized case) or each agent (decentralized case).

整个模型的运行方式为：首先通过offline-trained DM预测并规划智能体的运行路线，接着让$t+1$与$t$时刻通过逆动力学模型得到在$t$时刻的动作（或是控制信号变量组成的矩阵）。

Diffusion的过程受控于当前系统观测的状态obseved state与information。这里的information包含observations，rewards与constraints。

另外本文提出的方案与传统的不同之处在于。传统的多智能体控制方案是中心化的。每一个智能体只单独做出决策，不需要与其他智能体进行通信。而本文提出的方案称为decentralized execution with teammate modeling。智能体i能够根据当前的观测预测其他智能体的下一步动作。

文章进行的实验总结如下：

目标：Modeling the complex interactions among cooperative agents
其包含两个小目标：

是否能够生成高质量的多智能体行进路线