The study proposes an integrated framework for medical teaching that improves interactive behavior recognition and feedback optimization using attentional mechanisms. The behavioral recognition model uses a multimodal encoder and attentional neuron architecture to selectively prioritize video audio sequences and salient textual instructions. This model focuses on the most informative features and temporal patterns, improving the accuracy of recognizing student engagement and learning behavior in complex environments. Experimental evaluations on multiple medical education datasets demonstrate substantial improvements in recognition accuracy and feedback efficiency over state-of-the-art methods. The feedback optimization strategy dynamically adjusts instructional responses through an iterative process of improvement and integrates behavioral assessment with domain-specific pedagogical knowledge. It thus creates contextually accurate, adaptive feedback in line with student needs through weighted behavioral assessment and parameter updates. The integrated system increases real-time interpretability of teaching interactions, student engagement and provides a scalable solution for intelligent support of medical education.