In this paper, we study the problem of nonepisodic reinforcement learning (RL) for nonlinear
dynamical systems, where the system dynamics are unknown and the RL agent
has to learn from a single trajectory, i.e., without resets. We propose Nonepisodic
Optimistic RL (NEORL), an approach based on the principle of optimism in
the face of uncertainty. NEORL uses well-calibrated probabilistic models and
plans optimistically w.r.t. the epistemic uncertainty about the unknown dynamics.
Under continuity and bounded energy assumptions on the system, we provide a
first-of-its-kind regret bound of
The paper is available here.