Context
This work is submitted to RSS 2026, “Planning from Observation and Interaction.”
The goal is to learn task plans from multimodal data instead of relying on manually engineered planners.
My Contribution
- Designed robotic environments and expert data collection pipelines
- Built automated real-task evaluation infrastructure
- Implemented and trained multimodal RL / IRL / BC agents
- Integrated learned components with execution policies on real robots
Technical Details
The system learns task-level structure from:
- Observation trajectories
- Interaction signals
- Demonstration rollouts
We combine:
- Representation learning for latent task structure
- Policy learning for low-level execution
- Planning over learned abstractions
The key challenge was stabilizing training across simulation and real hardware while maintaining structured task reasoning.
Results
- Evaluated on simulated and real manipulation tasks
- Demonstrated planning from passive observation plus limited interaction
- Part of a submission to RSS 2026