#sam

Segment anything

Alexander Kirillov et al., Meta, 2023

Code

  • 初始化:
    sam_checkpoint = "segment-anything/model/sam_vit_h_4b8939.pth"
    model_type = "vit_h"
    device = "cuda"
    sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
    sam.to(device=device)
  • 调用(可以不给参数?)
    mask_generator_ = SamAutomaticMaskGenerator(
        model=sam,
        points_per_side=32,
        pred_iou_thresh=0.9,
        stability_score_thresh=0.96,
        crop_n_layers=1,
        crop_n_points_downscale_factor=2,
        min_mask_region_area=100,  # Requires open-cv to run post-processing
    )
     
  • 返回的masks:一个列表,其中每一项为一个dict,对应一个分割出的mask。
    • segmentation - [np.ndarray] - 原图尺寸大小(h,w)的二维numpy.ndarray,用True/False标识出该mask所分割出的区域
    • area - [int] - the area of the mask in pixels
    • bbox - [List[int]] - 分割出物体的boundary box,格式为 [x, y, w, h]
    • predicted_iou - [float] - the model’s own prediction for the quality of the mask
    • point_coords - [List[List[float]]] - the sampled input point that generated this mask
    • stability_score - [float] - an additional measure of mask quality
    • crop_box - List[int] - the crop of the image used to generate this mask in xywh format

SAM的变体

MobileSAM

  • 将图像编码器改为TinyViT(参考论文Fast Pretraining Distillation for Small Vision Transformers)

FastSAM

Xu Zhao et al., CASIA, 2023 https://arxiv.org/pdf/2306.12156.pdf https://github.com/CASIA-IVA-Lab/FastSAM

  • based on YOLOv8-seg,使用YOLACT(实力分割)
  • YOLO v8 backbone network ->

SAM的对比

  • 参数数量
    模型参数数量
    SAM-B136M
    FastSAM-s11M
    FastSAM-x(default)68M
    MobileSAM9.66M

3D Skeletonization of Complex Grapevines for Robotic Pruning

Eric Schneider et al., CMU Kantor Lab, IROS 2023

  • 纯图像处理方向的研究
  • 相机系统
    • 一系列立体图像,配准成为点云,产生图像分割遮罩
    • A. Silwal, T. Parhar, F. Yandun, H. Baweja, and G. Kantor, “A robust illumination-invariant camera system for agricultural applications,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 3292–3298.
  • 图像处理方法
    • 用MMLab segmentation toolkit,尝试了不同模型,最终选择UNet+geometric augmentation
    • 得到分割后遮罩的2D骨架,对2D骨架进行dilation
    • Make skeletal model
      • 假设较短距离内的点应当相连,先对每一个点一定范围内的相邻点进行扫描(用一个球体),再使用k-d tree
      • Minimum Spanning Tree (MST) using Kruskal algorithm, where Euclidean distance is the edge cost