AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust

Abstract

Bidirectional thrust grants quadrotors a second equilibrium condition and increased control authority, expanding the envelope of possible aggressive maneuvers and enabling inverted flight, perching, and sensing. Prior geometric control approaches extend differential flatness through Hopf fibration-based attitude representations to support bidirectional thrust, but struggle with actuator saturation and motor reversal delay during inversions, requiring heuristic thrust posture scheduling and waypoint tuning. We propose a learning-based framework that modulates a constant reference trajectory to perform compact, position-constrained quadrotor inversions while remaining compatible with traditional trajectory generation and tracking across flight regimes. Separate policies are trained via reinforcement learning for nominal-to-inverted and inverted-to-nominal transitions. In JAX-based simulation, the proposed method achieves the lowest position deviation and settling time across all evaluated baselines, reducing position root mean square error (RMSE) by 32% and settling time by 57% relative to the strongest optimization-based baseline. Hardware experiments demonstrate successful inversion across multiple yaw configurations with position RMSE below 0.35 m, and compatibility with downstream trajectory generation and control through circular flight in both regimes. Additionally, we provide an open-source implementation of the proposed framework.

Method

Key Insight: Our method learns feedforward trajectory modulation over the full system dynamics by explicitly modeling thrust asymmetry, reversal delay, and stochasticity, guided by an almost globally stabilizing geometric controller.

Overview of the proposed method. A reference modulation policy \(\boldsymbol{\pi}\), activated with inversion flag \(\mu\), observes the robot state \(\boldsymbol{x}\) and finite action history \(\boldsymbol{a}_\text{hist}\) to produce a position reference modulation and thrust posture, \(\boldsymbol{a} = [\boldsymbol{r}_{\delta\kappa}, \eta]\). These outputs and differentially flat reference \(\boldsymbol{\sigma}_{\kappa}\) are mapped through a Hopf fibration-based control algorithm to generate control input \(\boldsymbol{u} = [f_c, \boldsymbol{\tau}]\), passed to a box-constrained optimal control allocation that computes optimal thrust commands \(\boldsymbol{T}^*\). Finally, these are converted to motor rates \(\boldsymbol{\Omega}\) via an asymmetric thrust model and executed on the quadrotor dynamics.

Simulation Results

Strongest Baseline

Ours

BibTeX

@article{rodriguez2026acrorl,
  author    = {Rodriguez, Gabriel and Sayag, Henri and Rathod, Abhishek and Stecklein, John and Saha, Siddharth and Barngrover, Christopher and Tabib, Wennie},
  title     = {AcroRL: Learning Aggressive Quadrotor Inversion using Bidirectional Thrust},
  journal   = {arXiv preprint arXiv:2605.24301},
  doi       = {10.48550/arXiv.2605.24301},
  year      = {2026},
}