【备忘】CogVLM2环境搭建2 —— 使用问题,triton无法编译
LateLinux 2024-08-23 15:01:04 阅读 76
目录
背景问题的分析1. ```gcc```版本和cuda版本的对齐问题。2. 链接库的路径不对,或者找不到链接库。2.1 设置正确的```CUDA```路径2.2 确认链接库存在或者路径正确。
问题的解决1. 增加```CUDA_PATH```环境变量,并根据你的cuda安装路径设置为正确值。2. 安装```plocate```,并运行updatedb将搜索索引更新。3. 修改triton包中```compiler.py```文件的```libcuda_dirs()```函数修改后的运行结果
背景
如果想看搭建基础环境的,请参考第一篇,虽然环境跑起来了,但是第一通对话就不成功,显示如下问题
<code>/usr/bin/ld: 找不到 -lcuda: 没有那个文件或目录
collect2: error: ld returned 1 exit status
Traceback (most recent call last):
File "<string>", line 21, in rotary_kernel
KeyError: ('2-.-0-.-0-83ca8b715a9dc5f32dc1110973485f64-d6252949da17ceb5f3a278a70250af13-1af5134066c618146d2cd009138944a0-e177f815e3215f52f6b30ce0392b4ae7-3498c340fd4b6ee7805fd54b882a04f5-e1f133f98d04093da2078dfc51c36b72-b26258bf01f839199e39d64851821f26-d7c06e3b46e708006c15224aac7a1378-f585402118c8a136948ce0a49cfe122c', (torch.bfloat16, torch.bfloat16, torch.bfloat16, torch.bfloat16, None, 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32', 'i32'), (128, False, False, False, False, 4), (True, True, True, True, (False,), (True, False), (False, False), (True, False), (True, False), (False, False), (False, False), (True, False), (True, False), (True, False), (False, True), (True, False), (True, False), (True, False), (False, True)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/think/CogVLM2-main/basic_demo/cli_demo.py", line 102, in <module>
outputs = model.generate(**inputs, **gen_kwargs)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/transformers/generation/utils.py", line 1622, in generate
result = self._sample(
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/transformers/generation/utils.py", line 2791, in _sample
outputs = self(
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/modeling_cogvlm.py", line 620, in forward
outputs = self.model(
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/modeling_cogvlm.py", line 402, in forward
return self.llm_forward(
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/modeling_cogvlm.py", line 486, in llm_forward
layer_outputs = decoder_layer(
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/modeling_cogvlm.py", line 261, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/modeling_cogvlm.py", line 204, in forward
query_states, key_states = self.rotary_emb(query_states, key_states, position_ids=position_ids, max_seqlen=position_ids.max() + 1)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward
output = module._old_forward(*args, **kwargs)
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/util.py", line 469, in forward
q = apply_rotary_emb_func(
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/util.py", line 329, in apply_rotary_emb
return ApplyRotaryEmb.apply(
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/util.py", line 255, in forward
out = apply_rotary(
File "/home/think/.cache/huggingface/modules/transformers_modules/cogvlm2-llama3-chinese-chat-19B-int4/util.py", line 212, in apply_rotary
rotary_kernel[grid](
File "<string>", line 41, in rotary_kernel
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/triton/compiler.py", line 1587, in compile
so_path = make_stub(name, signature, constants)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/triton/compiler.py", line 1476, in make_stub
so = _build(name, src_path, tmpdir)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/triton/compiler.py", line 1391, in _build
ret = subprocess.check_call(cc_cmd)
File "/home/think/miniconda3/envs/cogvlm2/lib/python3.10/subprocess.py", line 369, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmps_oxbp69/main.c', '-O3', '-I/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/triton/third_party/cuda/include', '-I/home/think/miniconda3/envs/cogvlm2/include/python3.10', '-I/tmp/tmps_oxbp69', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmps_oxbp69/rotary_kernel.cpython-310-x86_64-linux-gnu.so']' returned non-zero exit status 1.
留意其中最后几行,triton在编译中(compiler.py)有问题。问题出在编译中使用系统的gcc,链接了系统的cuda库的libcuda.so编译报错,返回值为1(即异常返回),所以triton编译不成功,没能跑下去。
问题的分析
既然是gcc编译,猜测以下方向:
1. gcc
版本和cuda版本的对齐问题。
问了一下AI,说cuda11.8
最好用gcc7 ~ 10。机器是共用的,也不知道是谁装了个gcc13
,于是重新装了一个gcc-9
,并用update-alternatives
工具管理,将系统的gcc编译器设置为gcc-9
。
sudo apt-get update
sudo apt-get install gcc-9 g++-9
update-alternatives 的命令就不贴,网上很多,而且每个人的环境情况也不同。
2. 链接库的路径不对,或者找不到链接库。
根据compiler.py
中查找CUDA
路径的源码来一个个对照核实。
compiler.py
在triton包中,路径为<your-env-path>/lib/python3.10/site-packages/triton
,留意你的python版本不是3.10,则对应修改。
2.1 设置正确的CUDA
路径
compiler.py
中_build(name, src, srcdir)
函数中,搜索CUDA
路径的方法是取环境变量中的CUDA_PATH
(下面代码片段),如果找不到,就调用default_cuda_dir()
函数查找,我的系统虽然没有这个环境变量,但是cudahome的路径刚好也是default_cuda_dir()
中的返回值,为了保险还是添加CUDA_PATH
这个环境变量(大部分程序都用CUDA_HOME
,triton的作者比较特殊,用了一个CUDA_PATH
)。
def _build(name, src, srcdir):
cuda_lib_dirs = libcuda_dirs()
cuda_path = os.environ.get('CUDA_PATH', default_cuda_dir())
cu_include_dir = os.path.join(cuda_path, "include")
base_dir = os.path.dirname(__file__)
export CUDA_PATH=<your-cuda-home-path>
每个人的cuda装的情况不尽相同,我的cudahome路径是/usr/local/cuda
,供参考。
2.2 确认链接库存在或者路径正确。
搜索报错信息,subprocess
调用的命令叫cc_cmd
,搜索一下,在1390行,即以下片段
cc_cmd = [cc, src, "-O3", f"-I{ cu_include_dir}", f"-I{ py_include_dir}", f"-I{ srcdir}", "-shared", "-fPIC", "-lcuda", "-o", so]
cc_cmd += [f"-L{ dir}" for dir in cuda_lib_dirs]
ret = subprocess.check_call(cc_cmd)
其中cuda_lib_dirs
这个列表变量是关键,再找,发现在libcuda_dirs()
函数中得到,继续找,发现函数很简单,就是调用了linux的whereis
命令,然后搜索libcuda.so
这个库函数,等效命令就是whereis libcuda.so
。
def libcuda_dirs():
locs = subprocess.check_output(["whereis", "libcuda.so"]).decode().strip().split()[1:]
return [os.path.dirname(loc) for loc in locs]
于是在bash中自己验证了一下就发现了问题,通过whereis libcuda.so
这个命令是找不到的,所以链接才会失败。于是通过updatedb来更新搜索的索引,发现机器上根本就没有装updatedb。于是赶紧装了一个plocate工具(它是带updatedb的)。
whereis libcuda.so
libcuda.so:
sudo updatedb
sudo: updatedb:找不到命令
sudo apt-get install plocate
问题的解决
解决的方法其实也很简单,就三步:
1. 增加CUDA_PATH
环境变量,并根据你的cuda安装路径设置为正确值。
2. 安装plocate
,并运行updatedb将搜索索引更新。
sudo apt-get install plocate
#完成后再updatedb,刷新搜索索引
sudo updatedb
3. 修改triton包中compiler.py
文件的libcuda_dirs()
函数
改一下,把whereis
命令改为plocate
命令
@functools.lru_cache()
def libcuda_dirs():
# locs = subprocess.check_output(["whereis", "libcuda.so"]).decode().strip().split()[1:]
locs = subprocess.check_output(["plocate", "libcuda.so"]).decode().strip().split()[1:]
return [os.path.dirname(loc) for loc in locs]
修改后的运行结果
(cogvlm2) think@xsyDL-4090solo:~/CogVLM2-main/basic_demo$ python cli_demo.py
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
/home/think/miniconda3/envs/cogvlm2/lib/python3.10/site-packages/transformers/quantizers/auto.py:159: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.
warnings.warn(warning_msg)
image path >>>>> demo.jpg
Human:What's in it?
CogVLM2: The image captures a serene and lush landscape. At the center, there is a wooden boardwalk that cuts through a vibrant green meadow. The boardwalk appears weathered and has a rustic charm, inviting one to walk through and explore the surroundings. The meadow is lush with tall grasses that sway gently in the breeze, suggesting a sense of peace and tranquility. The background is a clear blue sky with some wispy clouds, indicating a sunny day. There are no visible animals or people, which adds to the stillness of the scene. The landscape is likely a nature preserve or a park, designed to offer visitors a tranquil walk through a natural setting.
Human:Can you answer in Chinese?
CogVLM2: 图片中有一条木制栈道,位于一片绿色的草地中央。栈道看起来有些老旧,但依然坚固,引导人们走进这片美丽的自然景观。草地上长满了茂盛的植物,它们在微风中轻轻摇曳,营造出一种宁静和谐的氛围。天空是晴朗的蓝色,上面飘着几朵白云,为这片景色增添了几分生动。这里没有明显的动物或人类活动,进一步增强了这份宁静。很可能,这是一个自然保护区或公园,旨在为游客提供一个漫步自然、亲近自然的场所。
Human:
声明
本文内容仅代表作者观点,或转载于其他网站,本站不以此文作为商业用途
如有涉及侵权,请联系本站进行删除
转载本站原创文章,请注明来源及作者。