Abstract
Rapid localisation of trapped victims after urban disasters is essential but challenging because Bluetooth Low Energy (BLE) beacons are intermittent, radio propagation is obstructed by rubble, UAVs are energy-constrained, and real-world multi-UAV training is impractical in high-risk search-and-rescue (SAR) environments. This study formulates post-disaster victim localisation as a cooperative Dec-POMDP and adapts a model-aided federated multi-agent reinforcement learning framework based on FedQMIX. The proposed pipeline combines a lightweight LoS/NLoS surrogate channel model, PSO-based victim-position estimation, return-to-base and map-feasibility safety checks, an SAR-aligned shaped reward, and a leakage-free centralised training state based on estimated rather than ground-truth victim locations. Each UAV trains locally inside a learned digital-twin simulator and periodically shares only QMIX network parameters, avoiding the exchange of raw trajectories or RSSI logs. The framework is evaluated on two synthetic post-earthquake urban maps representing a compact return-to-base scenario and a larger reach-to-destination scenario. Across five independent seeds per method and map, Model-Aided FedQMIX achieves the highest and most stable victim-localisation performance, with the clearest advantage observed in the larger long-horizon scenario. Additional diagnostic tests examine reward-weight sensitivity, RF channel-shift robustness, BLE/smartphone hardware heterogeneity, non-IID client-data variation, and partial-client FedAvg under missing client updates. The results indicate that combining model-aided localisation cues, decentralised value factorisation, SAR-aligned objective design, and federated parameter sharing can improve the robustness of UAV-based victim-localisation policies. The framework also clarifies deployment considerations for federated SAR coordination, including communication payload, privacy boundaries, heterogeneous client experience, device variability, and intermittent connectivity. This study remains simulation-based, and future validation with real UAVs, BLE devices, and rubble-inspired testbeds is required before operational deployment.
IPC Classification
Keywords
€ 4.00