【AI】ubuntu 22.04 RTX4060TI 16G 本地部署通义千问 7B模型

hkNaruto 2024-07-05 15:01:02 阅读 78

下载模型

git lfs install

git clone https://www.modelscope.cn/qwen/Qwen-7B.git

中途下载报错，手动下载几个没有正常拉下来的模型文件

移动过来

Quickstart

创建工作目录以及env

要求及安装依赖

pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed

pip install flash-attention

pip install modelscope

测试，非常缓慢！GPU显存接近极限

异常缓慢，调整代码报错，显存不足

fp16，bf16都会报错。

接近4分钟，没有意义了。。。

后续研究下量化方法，缩小模型对显存的消耗看看效果。

7B模型这个问题的回答乱七八糟

部署Qwen web

下载源代码

git clone https://gh-proxy.com/https://github.com/QwenLM/Qwen

创建venv安装依赖

安装web的依赖

pip install -r requirements_web_demo.txt

直接启动报错

python3 web_demo.py

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/home/yeqiang/下载/src/Qwen/web_demo.py", line 209, in <module>

main()

File "/home/yeqiang/下载/src/Qwen/web_demo.py", line 203, in main

model, tokenizer, config = _load_model_tokenizer(args)

File "/home/yeqiang/下载/src/Qwen/web_demo.py", line 41, in _load_model_tokenizer

tokenizer = AutoTokenizer.from_pretrained(

File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 773, in from_pretrained

config = AutoConfig.from_pretrained(

File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained

config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)

File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 634, in get_config_dict

config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)

File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict

resolved_config_file = cached_file(

File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 425, in cached_file

raise EnvironmentError(

OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like Qwen/Qwen-7B-Chat is not the path to a directory containing a file named config.json.

Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

没有指定模型，自动连接到huggingface，报错

查看help

(venv) (base) yeqiang@yeqiang-Default-string:~/Downloads/src/Qwen$ python3 web_demo.py --help

usage: web_demo.py [-h] [-c CHECKPOINT_PATH] [--cpu-only] [--share] [--inbrowser] [--server-port SERVER_PORT] [--server-name SERVER_NAME]

options:

-h, --help show this help message and exit

-c CHECKPOINT_PATH, --checkpoint-path CHECKPOINT_PATH

Checkpoint name or path, default to 'Qwen/Qwen-7B-Chat'

--cpu-only Run demo with CPU only

--share Create a publicly shareable link for the interface.

--inbrowser Automatically launch the interface in a new tab on the default browser.

--server-port SERVER_PORT

Demo server port.

--server-name SERVER_NAME

Demo server name.

指定模型路径，再次启动

python3 web_demo.py -c /home/yeqiang/Downloads/ai/Qwen-7B --server-port 8080

Wed Apr 10 09:49:48 2024

+---------------------------------------------------------------------------------------+

| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |

|-----------------------------------------+----------------------+----------------------+

| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|=========================================+======================+======================|

| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:01:00.0 On | N/A |

| 0% 35C P8 13W / 165W | 13219MiB / 16380MiB | 34% Default |

| | | N/A |

+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=======================================================================================|

| 0 N/A N/A 3465 G /usr/lib/xorg/Xorg 200MiB |

| 0 N/A N/A 3617 G /usr/bin/gnome-shell 62MiB |

| 0 N/A N/A 66713 G ...38243838,2569802313780412916,262144 52MiB |

| 0 N/A N/A 258826 C python3 12890MiB |

+---------------------------------------------------------------------------------------+

web模式需要下载Qwen-7B-Chat模型。

Qwen-7B-Chat模型

git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git

速度有点慢，输出内容还行（显存90%左右，GPU 80%左右）

参考：

魔搭社区

上一篇： YOLOV8-gradcam 热力图可视化即插即用不需要对源码做任何修改!

下一篇：结合RNN与Transformer双重优点，深度解析大语言模型RWKV

本文标签

【AI】ubuntu 22.04 RTX4060TI 16G 本地部署通义千问 7B模型

声明

本文内容仅代表作者观点，或转载于其他网站，本站不以此文作为商业用途
如有涉及侵权，请联系本站进行删除
转载本站原创文章，请注明来源及作者。