Interactive Perception
- 与Active perception的区别:active perception是interactive perception的一种,active perception的相机可以移动,通过调整其位姿获取更好的视角,interactive perception还可以通过机械臂或其它装置与环境的交互,改善视角,或者产生其他信号以供处理
- Active perception和interactive perception的运动的规划:预定义的运动(predefined action primitives)、启发式(heuristic),或其它方法(如端到端)
- 一般采用MDP
对问题进行建模 - 如何表征状态state?选用什么样的表征形式?如何从传感器输入中
- 如何选择行为action?
2025
Eye, Robot: Learning to Look to Act with a BC-RL Perception-Action Loop
Justin Kerr, et al., AUTOLAB@UCB, Arxiv 2025
- active perception:相机安装在gimbal上
Vision in Action: Learning Active Perception from Human Demonstrations
Haoyu Xiong et al., MIT & Stanford, CoRL 2025
2024
MANIP: Integrating Interactive Perception into Long-Horizon Robot Manipulation Systems
Justin Yu et al., AUTOLAB@UCB, IROS 2024

- 同时实现了interactive perception policy和task policy
- 核心:两种sub-policies
- system confidence estimator
估计给出向量 ,表征 时刻的state estimation,所给出的估计为confidence和state viability。即使confidence处于较高水平,meta-policy也用分辨当前estimated state是否unsafe,unfamiliar (out of distribution),或者likely to lead to task failure - meta-policy通过procedure或learned得到,可以分为两种sub-policies,
- 1)task polices,以继续完成任务
- 2)interactive perception polices,以提高system perception confidence
- sub-policy应满足的条件:modular并且well-defined trigger conditions。
If trigger conditions are not disjoint, a meta-policy may select a sub-policy non-deterministically. - meta-policy是Markovian的,包含一个cycle counter以检查timeout condition
- 没有给出一个普适性的生成sub-policy的方法,通过三个case studies说明了应该如何将这个框架应用到不同问题中去
- system confidence estimator
- 三个case studies,surgical needle handover,cable untangling,和cable clip routing,
- 传递手术用缝合针:用visual servoing to actively align the needle,直到system confidence estimator观测到足够多的对齐点,servo to increase visibility 解决incorrect grasp positioning due to uncertainty in needle pose的问题
- 解开缠绕的绳子:MANIP解决SGTM 2.0的occasional cable grasp failures,通过添加一个interactive perception primitive,拾起并轻微的移动grasped cable, 通过比较运动前与运动后的local pose of the cable,确认是否成功抓住线缆
- 线缆走线:MANIP添加的perception primitive通过在线缆周围没有足够的空间供夹爪操作时,将线缆推离clip解决操作中的失误
- 好像并没有在这三个实例上实现MANIP,只是讨论其可行性
- Implemented case study:线缆追踪cable tracing,解决HANDLOOM 1.5存在的判断端点错误的问题(
still inherits the many endpoint termination failure modes)- 在HANDLOOM 1.5的基础上添加4个sub-policy:前3个为interactive perception
- Trace uncertainty disambiguation(TUG)
- 检出状态为undertermined时,执行该操作
One gripper then perturbs the cable there while the other gripper holds down the starting endpoint
- 检出状态为undertermined时,执行该操作
- Retrace uncertainty disambiguation(RUG)
one gripper perturbs the area where the retrace occurred, while the other gripper pins down the starting endpoint for the trace
- Move endpoint into filed-of-view (MEF)
- final endpoint在field-of-view边缘时,
one gripper picks that ending trace point and moves that portion of the cable back into the camera FOV
- final endpoint在field-of-view边缘时,
- cable endpoint verification(CEV),用来验证任务是否完成
- Trace uncertainty disambiguation(TUG)
- 在HANDLOOM 1.5的基础上添加4个sub-policy:前3个为interactive perception
MOSAIC: Learning Unified Multi-Sensory Object Property Representation for Robot Learning via Interactive Perception
Gyan Tatiya et al., Tufts University, ICRA2024
- multimodal object property learning with self-attention and interactive comprehension
- 完成两类任务:object category recognition和fetch object
- 使用pre-trained CLIP text model
- 共十个exploratory behavior获取物体的信息,分别是Look(非interactive,剩下的均为interactive)、Press、Grasp、Hold、Lift、Drop、Poke、Push、Shake、Tap,通过这些交互行为采集记录视觉、触觉、声音和描述文字等信息
- 没有对物体进行其他的操作(例如摆成特定形状)
Interactive Perception for Deformable Object Manipulation
Zehang Weng et al., KTH & HKPolyU, RAL 2024
- 相机固定在一个机械臂上,第二个机械臂与塑料袋交互
- 用POMDP对问题进行进行建模
- 使用部分物体的特征(certain object features)建立了子空间subspace,称之为structure of interest (SOI),
the geometry of this structure is coupled with the end-effector action - 通过强化学习将camera action映射到SOI中
- 根据物体特征构建一个子空间Dynamic Active Vision Space (DAVS)使得搜索更高效
| 模型 | POMDP |
|---|---|
| 状态为 | |
| 与状态相对应的观测为 | |
| 预期的相机和end-effector位姿 | |
| Method | PPO+ a straightforward Visual Servoing,目的为最大化SOI在相机视野中的可见度 |
- 如果认为相机运动和机械臂运动域相对独立而简单的将其叠加(乘法),会导致搜索维度过高。认为相机运动具有一定的拓扑规律从而为其降维(从一个动态子空间dynamical subspace中采样)。
A naive solution is to factorize the action space aswith the assumption of condiftional independence, which however results in high dimensionality for action exploration. In this work, we consider that the camera action space should depend on a reduced action space generated based on topological regularity, resulting in- 降维方法:manifold with boundary M描述并构建子空间
Grasp, See, and Place: Efficient Unknown Object Rearrangement With Policy Structure Prior
Kechun Xu et al., 浙大, TRO 2024
2023
Self-Supervised Unseen Object Instance Segmentation via Long-Term Robot Interaction
Yangxiao Lu et al., UT Dallas & Rice, RSS 2023
2020
Object Finding in Cluttered Scenes Using Interactive Perception
Tonci Novkovic, et al., Autonomous Systems Lab@ETH, ICRA 2020
- 仅实现interactive perception
- 任务:使用机械臂及其上搭载的RGB-D相机,find object in clutter。
Our goal is to find a mapping from sensor inputs to endeffector displacements The agent's next action is determined by an RL algorithm based on the current encoded state and the knowledge obtained from past experiences- 建模:离散的,有限(finite horizon)的MDP
2017
(综述)Interactive Perception: Leveraging Action in Perception and Perception in Action
Jeannette Bohg, et al., Stanford, TRO2017
- 重要综述类文章
- interactive perception所带来的好处
- Create Novel Signals (CNS),通过交互产生触觉、声音、或是随时间变化的连续图像
- Action Perception Regularity (APR)
- 交互将揭示传感器输出(机器人的输入信号/系统状态S)、机器人的行为action和时间t的规律,具体分为如下两个部分
- Using the Regularity
- (i) 动作执行前预测结果predict the sensory signal given knowledge about the action and environment properties
- (ii) 通过比较预测结果与观测,更新对环境的认识update the knowledge about some latent properties of the environment by comparing the prediction to the observation
- (iii) 通过环境的属性和观测到的传感器信号,推断实际(?)施加动作(例如执行抓取动作时,可能没有抓取到,反而推倒了物体?)infer the action that has been applied to generate the observed sensory signal given some environment properties.
- Learning the Regularity 找出动作和传感器反馈之间的因果关系
2014
Online Interactive Perception of Articulated Objects with Multi-Level Recursive Estimation Based on Task-Specific Priors
Roberto Martin Martin et al., the Robotics and Biology Laboratory, Technische Universitat Berlin, IROS2014
- 应该是一篇纯视觉相关的论文?研究如何通过连续的RGB-D流估计物体的kinematic model。interactive体现在借助外力(如人)改变目标物体(rigid object)状态。不涉及主动选择action,以对物体产生特定影响。
- 提出新的interactive perception算法:
online interactive perception algorithm to estimate parameterized kinematic models of unknown objects from streaming RGB-D data three interconnected levels of recursive state estimation三层估计,递归的,相互关联estimate feature motion:the estimation of 3D feature motion based on the 2D motion of tracked RGB featuresestimate rigid body motion:the estimation of rigid body motion based on the estimated feature motionestimate the overall kinematic model:the estimation of the kinematic model based on the rigid body motion