Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits
Published in Published in the twenty-fourth International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , 2025
Recommended citation: Jiayuan Liu, Siwei Wang, Zhixuan Fang, “Efficient and Optimal Policy Gradient Algorithm for Corrupted Multi-armed Bandits,” the twenty-fourth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2025.