News
- May 2026: Our paper “WS-GRPO: Weakly-Supervised Group-Relative Policy Optimization for Rollout-Efficient Reasoning” has been accepted to ICML 2026.
- May 2026: We released our survey “Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning” on arXiv.
- April 2026: Three papers have been accepted to ACL 2026.
- April 2026: Our symbolic music benchmark “CSyMR: Benchmarking Compositional Music Information Retrieval in Symbolic Music Reasoning” has been accepted to SIGIR 2026.
- Mar 2026: Our paper “Ctrls: Chain-of-thought reasoning via latent state-transition” has been accepted to AISTATS 2026.
- Mar 2026: Our paper “Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization” has been accepted to ICLR 2026.