【人工智能】Transformers之Pipeline(概述):30w+大模型极简应用

CSDN 2024-07-11 12:31:01 阅读 55

​​​​​​​

目录

一、引言 

二、pipeline库

2.1 概述

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别”

2.2.2 task列表

2.2.3 task默认模型

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

 2.3.2 查看model与task的对应关系

三、总结


一、引言 

 pipeline(管道)是huggingface transformers库中一种极简方式使用大模型推理的抽象,将所有大模型分为语音(Audio)、计算机视觉(Computer vision)、自然语言处理(NLP)、多模态(Multimodal)等4大类,28小类任务(tasks)。共计覆盖32万个模型

本文对pipeline进行整体介绍,之后本专栏以每个task为主题,分别介绍各种task使用方法。

二、pipeline库

2.1 概述

<code>管道是一种使用模型进行推理的简单而好用的方法。这些管道是从库中抽象出大部分复杂代码的对象,提供了专用于多项任务的简单 API,包括命名实体识别、掩码语言建模、情感分析、特征提取和问答。在使用上,主要有2种方法

使用task实例化pipeline对象使用model实例化pipeline对象

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别”

自动语音识别的task为automatic-speech-recognition:

import os

os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

os.environ["CUDA_VISIBLE_DEVICES"] = "2"

from transformers import pipeline

speech_file = "./output_video_enhanced.mp3"

pipe = pipeline(task="automatic-speech-recognition")code>

result = pipe(speech_file)

print(result)

2.2.2 task列表

task共计28类,按首字母排序,列表如下,直接替换2.2.1代码中的pipeline的task即可应用:

"audio-classification":将返回一个AudioClassificationPipeline。"automatic-speech-recognition":将返回一个AutomaticSpeechRecognitionPipeline。"depth-estimation":将返回一个DepthEstimationPipeline。"document-question-answering":将返回一个DocumentQuestionAnsweringPipeline。"feature-extraction":将返回一个FeatureExtractionPipeline。"fill-mask":将返回一个FillMaskPipeline:。"image-classification":将返回一个ImageClassificationPipeline。"image-feature-extraction":将返回一个ImageFeatureExtractionPipeline。"image-segmentation":将返回一个ImageSegmentationPipeline。"image-to-image":将返回一个ImageToImagePipeline。"image-to-text":将返回一个ImageToTextPipeline。"mask-generation":将返回一个MaskGenerationPipeline。"object-detection":将返回一个ObjectDetectionPipeline。"question-answering":将返回一个QuestionAnsweringPipeline。"summarization":将返回一个SummarizationPipeline。"table-question-answering":将返回一个TableQuestionAnsweringPipeline。"text2text-generation":将返回一个Text2TextGenerationPipeline。"text-classification"("sentiment-analysis"可用别名):将返回一个 TextClassificationPipeline。"text-generation":将返回一个TextGenerationPipeline:。"text-to-audio""text-to-speech"可用别名):将返回一个TextToAudioPipeline:。"token-classification"("ner"可用别名):将返回一个TokenClassificationPipeline。"translation":将返回一个TranslationPipeline。"translation_xx_to_yy":将返回一个TranslationPipeline。"video-classification":将返回一个VideoClassificationPipeline。"visual-question-answering":将返回一个VisualQuestionAnsweringPipeline。"zero-shot-classification":将返回一个ZeroShotClassificationPipeline。"zero-shot-image-classification":将返回一个ZeroShotImageClassificationPipeline。"zero-shot-audio-classification":将返回一个ZeroShotAudioClassificationPipeline。"zero-shot-object-detection":将返回一个ZeroShotObjectDetectionPipeline。

2.2.3 task默认模型

针对每一个task,pipeline默认配置了模型,可以通过pipeline源代码查看:

SUPPORTED_TASKS = {

"audio-classification": {

"impl": AudioClassificationPipeline,

"tf": (),

"pt": (AutoModelForAudioClassification,) if is_torch_available() else (),

"default": {"model": {"pt": ("superb/wav2vec2-base-superb-ks", "372e048")}},

"type": "audio",

},

"automatic-speech-recognition": {

"impl": AutomaticSpeechRecognitionPipeline,

"tf": (),

"pt": (AutoModelForCTC, AutoModelForSpeechSeq2Seq) if is_torch_available() else (),

"default": {"model": {"pt": ("facebook/wav2vec2-base-960h", "55bb623")}},

"type": "multimodal",

},

"text-to-audio": {

"impl": TextToAudioPipeline,

"tf": (),

"pt": (AutoModelForTextToWaveform, AutoModelForTextToSpectrogram) if is_torch_available() else (),

"default": {"model": {"pt": ("suno/bark-small", "645cfba")}},

"type": "text",

},

"feature-extraction": {

"impl": FeatureExtractionPipeline,

"tf": (TFAutoModel,) if is_tf_available() else (),

"pt": (AutoModel,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("distilbert/distilbert-base-cased", "935ac13"),

"tf": ("distilbert/distilbert-base-cased", "935ac13"),

}

},

"type": "multimodal",

},

"text-classification": {

"impl": TextClassificationPipeline,

"tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),

"pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),

"tf": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),

},

},

"type": "text",

},

"token-classification": {

"impl": TokenClassificationPipeline,

"tf": (TFAutoModelForTokenClassification,) if is_tf_available() else (),

"pt": (AutoModelForTokenClassification,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),

"tf": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),

},

},

"type": "text",

},

"question-answering": {

"impl": QuestionAnsweringPipeline,

"tf": (TFAutoModelForQuestionAnswering,) if is_tf_available() else (),

"pt": (AutoModelForQuestionAnswering,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),

"tf": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),

},

},

"type": "text",

},

"table-question-answering": {

"impl": TableQuestionAnsweringPipeline,

"pt": (AutoModelForTableQuestionAnswering,) if is_torch_available() else (),

"tf": (TFAutoModelForTableQuestionAnswering,) if is_tf_available() else (),

"default": {

"model": {

"pt": ("google/tapas-base-finetuned-wtq", "69ceee2"),

"tf": ("google/tapas-base-finetuned-wtq", "69ceee2"),

},

},

"type": "text",

},

"visual-question-answering": {

"impl": VisualQuestionAnsweringPipeline,

"pt": (AutoModelForVisualQuestionAnswering,) if is_torch_available() else (),

"tf": (),

"default": {

"model": {"pt": ("dandelin/vilt-b32-finetuned-vqa", "4355f59")},

},

"type": "multimodal",

},

"document-question-answering": {

"impl": DocumentQuestionAnsweringPipeline,

"pt": (AutoModelForDocumentQuestionAnswering,) if is_torch_available() else (),

"tf": (),

"default": {

"model": {"pt": ("impira/layoutlm-document-qa", "52e01b3")},

},

"type": "multimodal",

},

"fill-mask": {

"impl": FillMaskPipeline,

"tf": (TFAutoModelForMaskedLM,) if is_tf_available() else (),

"pt": (AutoModelForMaskedLM,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("distilbert/distilroberta-base", "ec58a5b"),

"tf": ("distilbert/distilroberta-base", "ec58a5b"),

}

},

"type": "text",

},

"summarization": {

"impl": SummarizationPipeline,

"tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),

"pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),

"default": {

"model": {"pt": ("sshleifer/distilbart-cnn-12-6", "a4f8f3e"), "tf": ("google-t5/t5-small", "d769bba")}

},

"type": "text",

},

# This task is a special case as it's parametrized by SRC, TGT languages.

"translation": {

"impl": TranslationPipeline,

"tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),

"pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),

"default": {

("en", "fr"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},

("en", "de"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},

("en", "ro"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},

},

"type": "text",

},

"text2text-generation": {

"impl": Text2TextGenerationPipeline,

"tf": (TFAutoModelForSeq2SeqLM,) if is_tf_available() else (),

"pt": (AutoModelForSeq2SeqLM,) if is_torch_available() else (),

"default": {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},

"type": "text",

},

"text-generation": {

"impl": TextGenerationPipeline,

"tf": (TFAutoModelForCausalLM,) if is_tf_available() else (),

"pt": (AutoModelForCausalLM,) if is_torch_available() else (),

"default": {"model": {"pt": ("openai-community/gpt2", "6c0e608"), "tf": ("openai-community/gpt2", "6c0e608")}},

"type": "text",

},

"zero-shot-classification": {

"impl": ZeroShotClassificationPipeline,

"tf": (TFAutoModelForSequenceClassification,) if is_tf_available() else (),

"pt": (AutoModelForSequenceClassification,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("facebook/bart-large-mnli", "c626438"),

"tf": ("FacebookAI/roberta-large-mnli", "130fb28"),

},

"config": {

"pt": ("facebook/bart-large-mnli", "c626438"),

"tf": ("FacebookAI/roberta-large-mnli", "130fb28"),

},

},

"type": "text",

},

"zero-shot-image-classification": {

"impl": ZeroShotImageClassificationPipeline,

"tf": (TFAutoModelForZeroShotImageClassification,) if is_tf_available() else (),

"pt": (AutoModelForZeroShotImageClassification,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("openai/clip-vit-base-patch32", "f4881ba"),

"tf": ("openai/clip-vit-base-patch32", "f4881ba"),

}

},

"type": "multimodal",

},

"zero-shot-audio-classification": {

"impl": ZeroShotAudioClassificationPipeline,

"tf": (),

"pt": (AutoModel,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("laion/clap-htsat-fused", "973b6e5"),

}

},

"type": "multimodal",

},

"image-classification": {

"impl": ImageClassificationPipeline,

"tf": (TFAutoModelForImageClassification,) if is_tf_available() else (),

"pt": (AutoModelForImageClassification,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("google/vit-base-patch16-224", "5dca96d"),

"tf": ("google/vit-base-patch16-224", "5dca96d"),

}

},

"type": "image",

},

"image-feature-extraction": {

"impl": ImageFeatureExtractionPipeline,

"tf": (TFAutoModel,) if is_tf_available() else (),

"pt": (AutoModel,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("google/vit-base-patch16-224", "3f49326"),

"tf": ("google/vit-base-patch16-224", "3f49326"),

}

},

"type": "image",

},

"image-segmentation": {

"impl": ImageSegmentationPipeline,

"tf": (),

"pt": (AutoModelForImageSegmentation, AutoModelForSemanticSegmentation) if is_torch_available() else (),

"default": {"model": {"pt": ("facebook/detr-resnet-50-panoptic", "fc15262")}},

"type": "multimodal",

},

"image-to-text": {

"impl": ImageToTextPipeline,

"tf": (TFAutoModelForVision2Seq,) if is_tf_available() else (),

"pt": (AutoModelForVision2Seq,) if is_torch_available() else (),

"default": {

"model": {

"pt": ("ydshieh/vit-gpt2-coco-en", "65636df"),

"tf": ("ydshieh/vit-gpt2-coco-en", "65636df"),

}

},

"type": "multimodal",

},

"object-detection": {

"impl": ObjectDetectionPipeline,

"tf": (),

"pt": (AutoModelForObjectDetection,) if is_torch_available() else (),

"default": {"model": {"pt": ("facebook/detr-resnet-50", "2729413")}},

"type": "multimodal",

},

"zero-shot-object-detection": {

"impl": ZeroShotObjectDetectionPipeline,

"tf": (),

"pt": (AutoModelForZeroShotObjectDetection,) if is_torch_available() else (),

"default": {"model": {"pt": ("google/owlvit-base-patch32", "17740e1")}},

"type": "multimodal",

},

"depth-estimation": {

"impl": DepthEstimationPipeline,

"tf": (),

"pt": (AutoModelForDepthEstimation,) if is_torch_available() else (),

"default": {"model": {"pt": ("Intel/dpt-large", "e93beec")}},

"type": "image",

},

"video-classification": {

"impl": VideoClassificationPipeline,

"tf": (),

"pt": (AutoModelForVideoClassification,) if is_torch_available() else (),

"default": {"model": {"pt": ("MCG-NJU/videomae-base-finetuned-kinetics", "4800870")}},

"type": "video",

},

"mask-generation": {

"impl": MaskGenerationPipeline,

"tf": (),

"pt": (AutoModelForMaskGeneration,) if is_torch_available() else (),

"default": {"model": {"pt": ("facebook/sam-vit-huge", "997b15")}},

"type": "multimodal",

},

"image-to-image": {

"impl": ImageToImagePipeline,

"tf": (),

"pt": (AutoModelForImageToImage,) if is_torch_available() else (),

"default": {"model": {"pt": ("caidas/swin2SR-classical-sr-x2-64", "4aaedcb")}},

"type": "image",

},

}

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

如果不想使用task中默认的模型,可以指定huggingface中的模型:

import os

os.environ["HF_ENDPOINT"] = "https://hf-mirror.com"

os.environ["CUDA_VISIBLE_DEVICES"] = "2"

from transformers import pipeline

speech_file = "./output_video_enhanced.mp3"

#transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium")code>

pipe = pipeline(model="openai/whisper-medium")code>

result = pipe(speech_file)

print(result)

 2.3.2 查看model与task的对应关系

可以登录https://huggingface.co/tasks查看

三、总结

本文为transformers之pipeline专栏的第0篇,后面会以每个task为一篇,共计讲述28+个tasks的用法,通过28个tasks的pipeline使用学习,可以掌握语音、计算机视觉、自然语言处理、多模态乃至强化学习等30w+个huggingface上的开源大模型。让你成为大模型领域的专家!

期待您的3连+关注,如何还有时间,欢迎阅读我的其他文章:

《AI—工程篇》

AI智能体研发之路-工程篇(一):Docker助力AI智能体开发提效

AI智能体研发之路-工程篇(二):Dify智能体开发平台一键部署

AI智能体研发之路-工程篇(三):大模型推理服务框架Ollama一键部署

AI智能体研发之路-工程篇(四):大模型推理服务框架Xinference一键部署

AI智能体研发之路-工程篇(五):大模型推理服务框架LocalAI一键部署

《AI—模型篇》

AI智能体研发之路-模型篇(一):大模型训练框架LLaMA-Factory在国内网络环境下的安装、部署及使用

AI智能体研发之路-模型篇(二):DeepSeek-V2-Chat 训练与推理实战

AI智能体研发之路-模型篇(三):中文大模型开、闭源之争

AI智能体研发之路-模型篇(四):一文入门pytorch开发

AI智能体研发之路-模型篇(五):pytorch vs tensorflow框架DNN网络结构源码级对比

AI智能体研发之路-模型篇(六):【机器学习】基于tensorflow实现你的第一个DNN网络

AI智能体研发之路-模型篇(七):【机器学习】基于YOLOv10实现你的第一个视觉AI大模型

AI智能体研发之路-模型篇(八):【机器学习】Qwen1.5-14B-Chat大模型训练与推理实战

AI智能体研发之路-模型篇(九):【机器学习】GLM4-9B-Chat大模型/GLM-4V-9B多模态大模型概述、原理及推理实战

《AI—Transformers应用》

【AI大模型】Transformers大模型库(一):Tokenizer

【AI大模型】Transformers大模型库(二):AutoModelForCausalLM

【AI大模型】Transformers大模型库(三):特殊标记(special tokens)

【AI大模型】Transformers大模型库(四):AutoTokenizer

【AI大模型】Transformers大模型库(五):AutoModel、Model Head及查看模型结构



声明

本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。