Interactive Perception

与Active perception的区别：active perception是interactive perception的一种，active perception的相机可以移动，通过调整其位姿获取更好的视角，interactive perception还可以通过机械臂或其它装置与环境的交互，改善视角，或者产生其他信号以供处理
Active perception和interactive perception的运动的规划：预定义的运动（predefined action primitives）、启发式（heuristic），或其它方法（如端到端）
一般采用MDP 对问题进行建模
- 如何表征状态state？选用什么样的表征形式？如何从传感器输入中
- 如何选择行为action？

2025

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop

Justin Kerr, et al., AUTOLAB@UCB, Arxiv 2025

active perception：相机安装在gimbal上

Vision in Action: Learning Active Perception from Human Demonstrations

Haoyu Xiong et al., MIT & Stanford, CoRL 2025

2024

MANIP: Integrating Interactive Perception into Long-Horizon Robot Manipulation Systems

Justin Yu et al., AUTOLAB@UCB, IROS 2024

同时实现了interactive perception policy和task policy
核心：两种sub-policies
- system confidence estimator 估计给出向量，表征时刻的state estimation，所给出的估计为confidence和state viability。即使confidence处于较高水平，meta-policy也用分辨当前estimated state是否unsafe，unfamiliar (out of distribution)，或者likely to lead to task failure
- meta-policy通过procedure或learned得到，可以分为两种sub-policies，
  - 1）task polices，以继续完成任务
  - 2）interactive perception polices，以提高system perception confidence
  - sub-policy应满足的条件：modular并且well-defined trigger conditions。If trigger conditions are not disjoint, a meta-policy may select a sub-policy non-deterministically.
  - meta-policy是Markovian的，包含一个cycle counter以检查timeout condition
- 没有给出一个普适性的生成sub-policy的方法，通过三个case studies说明了应该如何将这个框架应用到不同问题中去
三个case studies，surgical needle handover，cable untangling，和cable clip routing，
- 传递手术用缝合针：用visual servoing to actively align the needle，直到system confidence estimator观测到足够多的对齐点，servo to increase visibility 解决incorrect grasp positioning due to uncertainty in needle pose的问题
- 解开缠绕的绳子：MANIP解决SGTM 2.0的occasional cable grasp failures，通过添加一个interactive perception primitive，拾起并轻微的移动grasped cable，通过比较运动前与运动后的local pose of the cable，确认是否成功抓住线缆
- 线缆走线：MANIP添加的perception primitive通过在线缆周围没有足够的空间供夹爪操作时，将线缆推离clip解决操作中的失误
- 好像并没有在这三个实例上实现MANIP，只是讨论其可行性
Implemented case study：线缆追踪cable tracing，解决HANDLOOM 1.5存在的判断端点错误的问题（still inherits the many endpoint termination failure modes）
- 在HANDLOOM 1.5的基础上添加4个sub-policy：前3个为interactive perception
  - Trace uncertainty disambiguation（TUG）
    - 检出状态为undertermined时，执行该操作One gripper then perturbs the cable there while the other gripper holds down the starting endpoint
  - Retrace uncertainty disambiguation（RUG）
    - one gripper perturbs the area where the retrace occurred, while the other gripper pins down the starting endpoint for the trace
  - Move endpoint into filed-of-view （MEF）
    - final endpoint在field-of-view边缘时，one gripper picks that ending trace point and moves that portion of the cable back into the camera FOV
  - cable endpoint verification（CEV），用来验证任务是否完成

MOSAIC: Learning Unified Multi-Sensory Object Property Representation for Robot Learning via Interactive Perception

Gyan Tatiya et al., Tufts University, ICRA2024

multimodal object property learning with self-attention and interactive comprehension
完成两类任务：object category recognition和fetch object
使用pre-trained CLIP text model
共十个exploratory behavior获取物体的信息，分别是Look（非interactive，剩下的均为interactive）、Press、Grasp、Hold、Lift、Drop、Poke、Push、Shake、Tap，通过这些交互行为采集记录视觉、触觉、声音和描述文字等信息
没有对物体进行其他的操作（例如摆成特定形状）

Interactive Perception for Deformable Object Manipulation

Zehang Weng et al., KTH & HKPolyU, RAL 2024

相机固定在一个机械臂上，第二个机械臂与塑料袋交互
用POMDP对问题进行进行建模
使用部分物体的特征（certain object features）建立了子空间subspace，称之为structure of interest (SOI)，the geometry of this structure is coupled with the end-effector action
通过强化学习将camera action映射到SOI中
根据物体特征构建一个子空间Dynamic Active Vision Space (DAVS)使得搜索更高效

模型	POMDP
	状态为，包含configuration of robotic end-effector, camera, and object
	与状态相对应的观测为
	预期的相机和end-effector位姿，关键：如何为action space降维

Method	PPO+ a straightforward Visual Servoing，目的为最大化SOI在相机视野中的可见度

如果认为相机运动和机械臂运动域相对独立而简单的将其叠加（乘法），会导致搜索维度过高。认为相机运动具有一定的拓扑规律从而为其降维（从一个动态子空间dynamical subspace中采样）。
- A naive solution is to factorize the action space as with the assumption of condiftional independence, which however results in high dimensionality for action exploration. In this work, we consider that the camera action space should depend on a reduced action space generated based on topological regularity, resulting in
- 降维方法：manifold with boundary M描述并构建子空间

Grasp, See, and Place: Efficient Unknown Object Rearrangement With Policy Structure Prior

Kechun Xu et al., 浙大, TRO 2024

2023

Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction

Yangxiao Lu et al., UT Dallas & Rice, RSS 2023

2020

Object Finding in Cluttered Scenes Using Interactive Perception

Tonci Novkovic, et al., Autonomous Systems Lab@ETH, ICRA 2020

仅实现interactive perception
任务：使用机械臂及其上搭载的RGB-D相机，find object in clutter。Our goal is to find a mapping from sensor inputs to endeffector displacements
The agent's next action is determined by an RL algorithm based on the current encoded state and the knowledge obtained from past experiences
建模：离散的，有限（finite horizon）的MDP

2017

（综述）Interactive Perception: Leveraging Action in Perception and Perception in Action

Jeannette Bohg, et al., Stanford, TRO2017

重要综述类文章
interactive perception所带来的好处
- Create Novel Signals (CNS)，通过交互产生触觉、声音、或是随时间变化的连续图像
- Action Perception Regularity (APR)
  - 交互将揭示传感器输出（机器人的输入信号/系统状态S）、机器人的行为action和时间t的规律，具体分为如下两个部分
  - Using the Regularity
    - (i) 动作执行前预测结果predict the sensory signal given knowledge about the action and environment properties
    - (ii) 通过比较预测结果与观测，更新对环境的认识update the knowledge about some latent properties of the environment by comparing the prediction to the observation
    - (iii) 通过环境的属性和观测到的传感器信号，推断实际（？）施加动作（例如执行抓取动作时，可能没有抓取到，反而推倒了物体？）infer the action that has been applied to generate the observed sensory signal given some environment properties.
  - Learning the Regularity 找出动作和传感器反馈之间的因果关系

2014

Online Interactive Perception of Articulated Objects with Multi-Level Recursive Estimation Based on Task-Specific Priors

Roberto Martin Martin et al., the Robotics and Biology Laboratory, Technische Universitat Berlin, IROS2014

应该是一篇纯视觉相关的论文？研究如何通过连续的RGB-D流估计物体的kinematic model。interactive体现在借助外力（如人）改变目标物体（rigid object）状态。不涉及主动选择action，以对物体产生特定影响。
提出新的interactive perception算法：online interactive perception algorithm to estimate parameterized kinematic models of unknown objects from streaming RGB-D data
three interconnected levels of recursive state estimation三层估计，递归的，相互关联
- estimate feature motion: the estimation of 3D feature motion based on the 2D motion of tracked RGB features
- estimate rigid body motion: the estimation of rigid body motion based on the estimated feature motion
- estimate the overall kinematic model: the estimation of the kinematic model based on the rigid body motion

Z's learning note

Explorer

interactive_perception

Interactive Perception

2025

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop

Vision in Action: Learning Active Perception from Human Demonstrations

2024

MANIP: Integrating Interactive Perception into Long-Horizon Robot Manipulation Systems

MOSAIC: Learning Unified Multi-Sensory Object Property Representation for Robot Learning via Interactive Perception

Interactive Perception for Deformable Object Manipulation

Grasp, See, and Place: Efficient Unknown Object Rearrangement With Policy Structure Prior

2023

Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction

2020

Object Finding in Cluttered Scenes Using Interactive Perception

2017

（综述）Interactive Perception: Leveraging Action in Perception and Perception in Action

2014

Online Interactive Perception of Articulated Objects with Multi-Level Recursive Estimation Based on Task-Specific Priors

Table of Contents

Backlinks

Z's learning note

Explorer

interactive_perception

Interactive Perception §

2025 §

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop §

Vision in Action: Learning Active Perception from Human Demonstrations §

2024 §

MANIP: Integrating Interactive Perception into Long-Horizon Robot Manipulation Systems §

MOSAIC: Learning Unified Multi-Sensory Object Property Representation for Robot Learning via Interactive Perception §

Interactive Perception for Deformable Object Manipulation §

Grasp, See, and Place: Efficient Unknown Object Rearrangement With Policy Structure Prior §

2023 §

Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction §

2020 §

Object Finding in Cluttered Scenes Using Interactive Perception §

2017 §

（综述）Interactive Perception: Leveraging Action in Perception and Perception in Action §

2014 §

Online Interactive Perception of Articulated Objects with Multi-Level Recursive Estimation Based on Task-Specific Priors §

Table of Contents

Backlinks

Interactive Perception

2025

Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop

Vision in Action: Learning Active Perception from Human Demonstrations

2024

MANIP: Integrating Interactive Perception into Long-Horizon Robot Manipulation Systems

MOSAIC: Learning Unified Multi-Sensory Object Property Representation for Robot Learning via Interactive Perception

Interactive Perception for Deformable Object Manipulation

Grasp, See, and Place: Efficient Unknown Object Rearrangement With Policy Structure Prior

2023

Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction

2020

Object Finding in Cluttered Scenes Using Interactive Perception

2017

（综述）Interactive Perception: Leveraging Action in Perception and Perception in Action

2014

Online Interactive Perception of Articulated Objects with Multi-Level Recursive Estimation Based on Task-Specific Priors