Segment anything
Alexander Kirillov et al., Meta, 2023
Code
- 初始化:
sam_checkpoint = "segment-anything/model/sam_vit_h_4b8939.pth" model_type = "vit_h" device = "cuda" sam = sam_model_registry[model_type](checkpoint=sam_checkpoint) sam.to(device=device)
- 调用(可以不给参数?)
mask_generator_ = SamAutomaticMaskGenerator( model=sam, points_per_side=32, pred_iou_thresh=0.9, stability_score_thresh=0.96, crop_n_layers=1, crop_n_points_downscale_factor=2, min_mask_region_area=100, # Requires open-cv to run post-processing )
- 返回的masks:一个列表,其中每一项为一个dict,对应一个分割出的mask。
segmentation
-[np.ndarray]
- 原图尺寸大小(h,w)的二维numpy.ndarray,用True/False标识出该mask所分割出的区域area
-[int]
- the area of the mask in pixelsbbox
-[List[int]]
- 分割出物体的boundary box,格式为 [x, y, w, h]predicted_iou
-[float]
- the model’s own prediction for the quality of the maskpoint_coords
-[List[List[float]]]
- the sampled input point that generated this maskstability_score
-[float]
- an additional measure of mask qualitycrop_box
-List[int]
- the crop of the image used to generate this mask inxywh
format
SAM的变体
MobileSAM
- 将图像编码器改为TinyViT(参考论文Fast Pretraining Distillation for Small Vision Transformers)
FastSAM
Xu Zhao et al., CASIA, 2023 https://arxiv.org/pdf/2306.12156.pdf https://github.com/CASIA-IVA-Lab/FastSAM
- based on YOLOv8-seg,使用YOLACT(实力分割)
- YOLO v8 backbone network ->
SAM的对比
- 参数数量
模型 参数数量 SAM-B 136M FastSAM-s 11M FastSAM-x(default) 68M MobileSAM 9.66M
3D Skeletonization of Complex Grapevines for Robotic Pruning
Eric Schneider et al., CMU Kantor Lab, IROS 2023
- 纯图像处理方向的研究
- 相机系统
- 一系列立体图像,配准成为点云,产生图像分割遮罩
- A. Silwal, T. Parhar, F. Yandun, H. Baweja, and G. Kantor, “A robust illumination-invariant camera system for agricultural applications,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3292–3298.
- 图像处理方法
- 用MMLab segmentation toolkit,尝试了不同模型,最终选择UNet+geometric augmentation
- 得到分割后遮罩的2D骨架,对2D骨架进行dilation
- Make skeletal model
- 假设较短距离内的点应当相连,先对每一个点一定范围内的相邻点进行扫描(用一个球体),再使用k-d tree
- Minimum Spanning Tree (MST) using Kruskal algorithm, where Euclidean distance is the edge cost