The authors propose a new framework for evaluating artificial intelligence in clinical practice using simulated digital hospital environments.[1] These simulators capture the evolving constraints and cascading effects that arise in real-time clinical decisions.[1] The research tested a multimodal language model (Gemini Pro 2.5) in a BodyInteract simulation on four acute care scenarios and compared its performance to over 14,000 medical student simulations and an experienced emergency physician benchmark.[1] The findings showed that a modern multimodal language model can function as an autonomous virtual doctor and successfully stabilize patients in a simulated environment.[1] This approach makes it possible to evaluate AI not only on the basis of individual diagnoses, but as a continuous process in which the decision-maker alternately collects information and applies treatment.[1] Simulated environments that mimic real-world clinical settings are used to train and test current and future healthcare providers.[1]