【AI】ubuntu 22.04 RTX4060TI 16G 本地部署通义千问 7B模型
hkNaruto 2024-07-05 15:01:02 阅读 78
下载模型
git lfs install
git clone https://www.modelscope.cn/qwen/Qwen-7B.git
中途下载报错,手动下载几个没有正常拉下来的模型文件
移动过来
Quickstart
创建工作目录以及env
要求及安装依赖
pip install transformers==4.32.0 accelerate tiktoken einops scipy transformers_stream_generator==0.0.4 peft deepspeed
pip install flash-attention
pip install modelscope
测试,非常缓慢!GPU显存接近极限
异常缓慢,调整代码报错,显存不足
fp16,bf16都会报错。
接近4分钟,没有意义了。。。
后续研究下量化方法,缩小模型对显存的消耗看看效果。
7B模型这个问题的回答乱七八糟
部署Qwen web
下载源代码
git clone https://gh-proxy.com/https://github.com/QwenLM/Qwen
创建venv安装依赖
安装web的依赖
pip install -r requirements_web_demo.txt
直接启动报错
python3 web_demo.py
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/yeqiang/下载/src/Qwen/web_demo.py", line 209, in <module>
main()
File "/home/yeqiang/下载/src/Qwen/web_demo.py", line 203, in main
model, tokenizer, config = _load_model_tokenizer(args)
File "/home/yeqiang/下载/src/Qwen/web_demo.py", line 41, in _load_model_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 773, in from_pretrained
config = AutoConfig.from_pretrained(
File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1100, in from_pretrained
config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 634, in get_config_dict
config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/configuration_utils.py", line 689, in _get_config_dict
resolved_config_file = cached_file(
File "/home/yeqiang/下载/src/Qwen/venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 425, in cached_file
raise EnvironmentError(
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like Qwen/Qwen-7B-Chat is not the path to a directory containing a file named config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.
没有指定模型,自动连接到huggingface,报错
查看help
(venv) (base) yeqiang@yeqiang-Default-string:~/Downloads/src/Qwen$ python3 web_demo.py --help
usage: web_demo.py [-h] [-c CHECKPOINT_PATH] [--cpu-only] [--share] [--inbrowser] [--server-port SERVER_PORT] [--server-name SERVER_NAME]
options:
-h, --help show this help message and exit
-c CHECKPOINT_PATH, --checkpoint-path CHECKPOINT_PATH
Checkpoint name or path, default to 'Qwen/Qwen-7B-Chat'
--cpu-only Run demo with CPU only
--share Create a publicly shareable link for the interface.
--inbrowser Automatically launch the interface in a new tab on the default browser.
--server-port SERVER_PORT
Demo server port.
--server-name SERVER_NAME
Demo server name.
指定模型路径,再次启动
python3 web_demo.py -c /home/yeqiang/Downloads/ai/Qwen-7B --server-port 8080
Wed Apr 10 09:49:48 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:01:00.0 On | N/A |
| 0% 35C P8 13W / 165W | 13219MiB / 16380MiB | 34% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 3465 G /usr/lib/xorg/Xorg 200MiB |
| 0 N/A N/A 3617 G /usr/bin/gnome-shell 62MiB |
| 0 N/A N/A 66713 G ...38243838,2569802313780412916,262144 52MiB |
| 0 N/A N/A 258826 C python3 12890MiB |
+---------------------------------------------------------------------------------------+
web模式需要下载Qwen-7B-Chat模型。
Qwen-7B-Chat模型
git clone https://www.modelscope.cn/qwen/Qwen-7B-Chat.git
速度有点慢,输出内容还行(显存90%左右,GPU 80%左右)
参考:
魔搭社区
声明
本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。