Large Model

  • Detic: open-vocabulary object detector

  • Mamba:SSM(状态空间模型)变体

基础设施

Transformer

  • Transformer:基于注意力机制self-attention(attention is all you need,multi-headed attention)
  • Vision Transformer (ViT)P:ViT-S,ViT-B (base),ViT-L (large),ViT-H (huge)

BERT(Google)

  • 一种seq2seq模型,基于Transformer Encoder,缺点:注意力机制计算开销大,随输入规模二次增长

GPT(OpenAI)

  • Generative Pre-trained Transformer,基于Transformer Decoder

大语言模型Large Language Model

视觉语言模型Vision-Language Model

CLIP

SigLIP

SILC