For proper balance assessment during a physical rehabilitation program, clinicians typically rely on rubric-oriented battery tests to score a patient’s physical capabilities, which can lead to subjectivity. While approaches have been taken to increase the objectivity of physical assessments and provide more detailed quantifications, many of them tend to be limited to tracking the center of pressure, which can lead to missing information that is critical to the whole-body movement. To this end, the center of mass (COM) state space presents a promising avenue for balance characterization. Previous work has shown various approaches to analyze balance in the COM state space through analytical, computational, and experimental methods. Here, we investigate balance recovery by developing a balance controller for a musculoskeletal model through reinforcement learning (RL). The RL framework is built on top of two neural networks describing the joint trajectory mimicking or targeting and the muscle activation coordinate, which in combination control the musculoskeletal model. The neural network models, i.e., the controller, are learned through the Proximal Policy Algorithm (PPO). The controller is then iteratively tested by imposing random initial states (positions and velocities) to the model and notating the trials with successful balance recovery trajectories. This allows for the construction of a balance region (BR) which describes the potential initial COM states that will yield balanced trajectories. The resulting BR was then compared to the analytical margin of stability determined through a linear inverted pendulum approach, which showed a similar trend in the successful COM states. Overall, the presented balance controller shows a promising new approach to assessing balance in bipedal systems, particularly in humans.