S use a Markov model to take care of various ad-Electronics 2021, 10,20 ofvanced Erastin In stock jamming attacks. When dealing with attacks such as swept jamming and dynamic jamming, the authors model a multi-agent reinforcement learning (MARL) algorithm for powerful defense. The simulation final results show that the algorithm can efficiently avoid these advanced jamming attacks, thanks to collaboratively sharing the spectrum to its agents. In [104], a novel DRL-based algorithm is proposed to make sure secure beamforming strategy against eavesdroppers in dynamic IRS-aided environments. The model makes use of post-decision state (PDS) and prioritized experience replay (PER) approaches to increase the mastering efficiency and secrecy functionality from the technique. The proposed novel approach can drastically enhance the technique secrecy price and QoS (therefore optimal beamforming is required) in IRS-aided safe communication systems. 4.3.9. Visible Light Comunication In [124], the authors propose a DQN primarily based multi-agent multi-user algorithm for hybrid Vc-seco-DUBA custom synthesis networks for power allocation. These networks are composed of radio frequency (RF) and visible light communication (VLC) access points (APs). The users are capable of multi-hopping, which can link RF and VLC systems in terms of bandwidth needs. Inside the proposed DQN algorithm, every AP is regarded an agent and so the transmit energy needed for customers is optimized by an internet energy allocation strategy. Simulation outcomes demonstrate faster median convergence time training (90 shorter than standard QLearning based algorithm) and convergence rate is 96.1 (whereas standard QL-based algorithm’s convergence rate in 72.three ). In [125], a multi-agent Q-learning algorithms is proposed for energy allocation method in RF/VLC systems. In these systems, so as to ensure QoS satisfaction, the transmit power in the Aps demands to become optimized. Simulation results demonstrate the effectiveness of your proposed Q-learning based approach when it comes to accuracy and overall performance. four.3.ten. Fault/Anomaly Management In [126], a deep Q-learning method is proposed for fault detection and diagnosis in 6G networks. Simulation outcomes show that the algorithm can use less attributes and accomplish higher accuracy, as much as 96.7 Table 9 holds a brief summary with the RL models utilized in many 6G issues.Table 9. RL models in 6G numerous difficulties. Paper [110] [111] ML Approach RL-based on auction model MDP Q-learning, Deep Q-learning Application Challenge Channel allocation Channel allocation Description Based on a carrier sensing various access (CSMA) implementation, performs effectively for LTE scenarios Allocates channels in densely deployed WLANs, top to throughput enhancement Utilized in cooperative networks on user devices and SBS, respectively, reaching excellent energy saving final results Accelerates block verification, exactly where the reward function considers energy for trasnmission and caching, although providing privacy protection Theactor outputs offloading ratio and regional computation capacity as well as the critic evaluates these continuous outputs with discrete server choice Minimizes prediction error and predict a battery’s energy consumption, although making access policy Maximizes throughput in energy-harvesting super IoT systems, whilst studying Pc policies[112]Energy consumption[113]DRLEnergy consumption, security[114]Hybrid-AC, MD-Hybrid-AC DQN, two-layered Rl algorithm Multi-agent RL, DNNDynamic computation offloading Energy consumption, joint access manage Power control[65] [115.