UAV Mission Planning for Post-Disaster Victim Localisation via Federated Multi-Agent Reinforcement Learning

Archive/UAV Mission Planning for Post-Disaster Victim Localisation via Federated Multi-Agent Reinforcement Learning

Alparslan Güzey, Mehmet Akif Çifçi, Fazlı Yıldırım et al.

18 de mayo de 2026

Abstract

Rapid localisation of trapped victims after urban disasters is essential but challenging because Bluetooth Low Energy (BLE) beacons are intermittent, radio propagation is obstructed by rubble, UAVs are energy-constrained, and real-world multi-UAV training is impractical in high-risk search-and-rescue (SAR) environments. This study formulates post-disaster victim localisation as a cooperative Dec-POMDP and adapts a model-aided federated multi-agent reinforcement learning framework based on FedQMIX. The proposed pipeline combines a lightweight LoS/NLoS surrogate channel model, PSO-based victim-position estimation, return-to-base and map-feasibility safety checks, an SAR-aligned shaped reward, and a leakage-free centralised training state based on estimated rather than ground-truth victim locations. Each UAV trains locally inside a learned digital-twin simulator and periodically shares only QMIX network parameters, avoiding the exchange of raw trajectories or RSSI logs. The framework is evaluated on two synthetic post-earthquake urban maps representing a compact return-to-base scenario and a larger reach-to-destination scenario. Across five independent seeds per method and map, Model-Aided FedQMIX achieves the highest and most stable victim-localisation performance, with the clearest advantage observed in the larger long-horizon scenario. Additional diagnostic tests examine reward-weight sensitivity, RF channel-shift robustness, BLE/smartphone hardware heterogeneity, non-IID client-data variation, and partial-client FedAvg under missing client updates. The results indicate that combining model-aided localisation cues, decentralised value factorisation, SAR-aligned objective design, and federated parameter sharing can improve the robustness of UAV-based victim-localisation policies. The framework also clarifies deployment considerations for federated SAR coordination, including communication payload, privacy boundaries, heterogeneous client experience, device variability, and intermittent connectivity. This study remains simulation-based, and future validation with real UAVs, BLE devices, and rubble-inspired testbeds is required before operational deployment.

Metadata

DOI: 10.3390/drones10050385 CC BY 4.0 license

IPC Classification

G06H04A61B60

Keywords

missionplanningpost-disastervictimlocalisationfederatedmulti-agentreinforcementlearningdronesrapidtrappedvictimsurbandisastersessentialchallengingbecausebluetoothenergybeaconsintermittentradiopropagation

Citar esta publicación

€ 4.00

← Back to Archive