• windows10搭建llama大模型


    背景

        随着人工时代的到来及日渐成熟,大模型已慢慢普及,可以为开发与生活提供一定的帮助及提升工作及生产效率。所以在新的时代对于开发者来说需要主动拥抱变化,主动成长。        

    LLAMA介绍

        llama全称:Large Language Model Meta AI是由meta(原facebook)开源的一个聊天对话大模型。根据参数规模,Meta提供了70亿、130亿、330亿和650亿四种不同参数规模的LLaMA模型,并使用20种语言进行了训练。与现有最佳的大型语言模型相比,LLaMA模型在性能上具有竞争力。
        官网:https://github.com/facebookresearch/llama

    注意:本文是llama不是llama2,原理一致!

    硬件要求

    硬件名称

    要求

    备注

    磁盘

    单盘最少120g以上

    模型很大的

    内存
    最少16g
    最好32g
    gpu
    可以没有
    当然最好有(要英伟达的)

    安装软件

    涉及软件版本

    软件名称

    版本

    备注

    anaconda3

    conda 22.9.0

    https://www.anaconda.com/

    python

    3.9.16

    anaconda自带

    peft

    0.2.0

    参数有效微调

    sentencepiece

    0.1.97

    分词算法

    transformers

    4.29.2

    下载有点久

    git

    2.40.1


    torch

    2.0.1


    mingw


    用window安装

    protobuf

    3.19.0


    cuda

    https://blog.csdn.net/zcs2632008/article/details/127025294

    有gpu才需要安装

    anaconda3安装

        安装这个anaconda建议不要在c盘,除非你的c盘够大。

    请参考:https://blog.csdn.net/scorn_/article/details/106591160?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522168601805516800197073452%2522%252C%2522scm%2522%253A%252220140713.130102334..%2522%257D&request_id=168601805516800197073452&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~all~sobaiduend~default-1-106591160-null-null.142^v88^control,239^v2^insert_chatgpt&utm_term=windows10%E5%AE%89%E8%A3%85anaconda3%E6%95%99%E7%A8%8B&spm=1018.2226.3001.4187

    创建环境
    1. conda create -n llama python=3.9.16
    2. conda init
    进入环境
    1. conda info -e
    2. conda activate llama

    6719607697a54b66cabfd7be044ce833.png

    后面验证python

    0e79f8e6d7302863dbb67cccef98b45a.png

    peft安装

    pip install peft==0.2.0

    f081293242e54cd0454f897f0757c5e3.png

    transformers安装

    注意:这个会很大~有点久~

    conda install transformers==4.29.2

    67f1e54d9afda52af1ceb3a426a67ba6.png

    安装git

    https://blog.csdn.net/dou3516/article/details/121740303

    027f7398e455cccf5d0ca872b7f92f71.png

    安装torch

    pip install torch==2.0.1

    e815e4d4b0a2881ac3076cd1009462fc.png

    安装mingw

    win+r输入powershell

    a8e09eaf8cd3279e24ad7fe41a4ac57a.png

    遇到禁止执行脚本问题:(如果没有异常请跳出这步)

    参考

    https://blog.csdn.net/weixin_43999496/article/details/115871373

    6cb8a00acb77c9e9569fa95cf2139499.png

    配置权限
    1. get-executionpolicy
    2. set-executionpolicy RemoteSigned
    然后输入Y

    69f9e6ac7ff89448552ae354129e79fd.png

    安装 mingw
    iex "& {$(irm get.scoop.sh)} -RunAsAdmin"

    e7d0b903fb2d80b30cdc9df0b504583a.png

    9285b86e895b510b26a7a776f4419ac4.png

    安装好后分别运行下面两个命令(添加库):

    scoop bucket add extras
    scoop bucket add main

    71f5b32f485250f684aeee46a54044be.png

    输入命令安装mingw

    scoop install mingw

    57f6e83ab08c505f91e82b99e2318449.png

    安装:protobuf

    pip install protobuf==3.19.0

    f2266a17894a34e8d98892567f3fd4e7.png

    项目配置

    下载代码

    需要下载两个模型, 一个是原版的LLaMA模型, 一个是扩充了中文的模型, 后续会进行一个合并模型的操作

    • 原版模型下载地址(要代理):https://ipfs.io/ipfs/Qmb9y5GCkTG7ZzbBWMu2BXwMkzyCKcUjtEKPpgdZ7GEFKm/

    • 备用:nyanko7/LLaMA-7B at main

      下载不了的话,请关注【技术趋势】回复llama1获取。

    创建文件夹

    a41960d1bb67a0d5b6f01775c25330a6.png

    git lfs install

    下载中文模型

    git clone https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b

    06e1d8df55da5eca950d99fcfcc397da.png

    补充Linux图:

    52926d687d4bc9161d3c44bebbcc85c2.png

    下载羊驼模型(有点大)

    先建一个文件夹:path_to_original_llama_root_dir

    700c706f36b0565835edfd37bb033472.png

    在里面再建一个7B文件夹并把tokenizer.model挪进来。

    b13f6d6a5829316c85d2999dfab53637.pngc3fb899766af736b3dd85ad13b15f0f9.png

    7B里面放的内容

    3598dec93d5f1835b8a2f9cdca4a49bf.png

    最终需要的内容如下:

    4e254cd0c1322cf9662a67c16e8b06ec.png

    合并模型

    下载:convert_llama_weights_to_hf.py

    📎convert_llama_weights_to_hf.py

    或将以下代码放到

    1. # Copyright 2022 EleutherAI and The HuggingFace Inc. team. All rights reserved.
    2. #
    3. # Licensed under the Apache License, Version 2.0 (the "License");
    4. # you may not use this file except in compliance with the License.
    5. # You may obtain a copy of the License at
    6. #
    7. # http://www.apache.org/licenses/LICENSE-2.0
    8. #
    9. # Unless required by applicable law or agreed to in writing, software
    10. # distributed under the License is distributed on an "AS IS" BASIS,
    11. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    12. # See the License for the specific language governing permissions and
    13. # limitations under the License.
    14. import argparse
    15. import gc
    16. import json
    17. import math
    18. import os
    19. import shutil
    20. import warnings
    21. import torch
    22. from transformers import LlamaConfig, LlamaForCausalLM, LlamaTokenizer
    23. try:
    24.     from transformers import LlamaTokenizerFast
    25. except ImportError as e:
    26.     warnings.warn(e)
    27.     warnings.warn(
    28.         "The converted tokenizer will be the `slow` tokenizer. To use the fast, update your `tokenizers` library and re-run the tokenizer conversion"
    29.     )
    30.     LlamaTokenizerFast = None
    31. """
    32. Sample usage:
    33. ```
    34. python src/transformers/models/llama/convert_llama_weights_to_hf.py \
    35.     --input_dir /path/to/downloaded/llama/weights --model_size 7B --output_dir /output/path
    36. ```
    37. Thereafter, models can be loaded via:
    38. ```py
    39. from transformers import LlamaForCausalLM, LlamaTokenizer
    40. model = LlamaForCausalLM.from_pretrained("/output/path")
    41. tokenizer = LlamaTokenizer.from_pretrained("/output/path")
    42. ```
    43. Important note: you need to be able to host the whole model in RAM to execute this script (even if the biggest versions
    44. come in several checkpoints they each contain a part of each weight of the model, so we need to load them all in RAM).
    45. """
    46. INTERMEDIATE_SIZE_MAP = {
    47.     "7B": 11008,
    48.     "13B": 13824,
    49.     "30B": 17920,
    50.     "65B": 22016,
    51. }
    52. NUM_SHARDS = {
    53.     "7B": 1,
    54.     "13B": 2,
    55.     "30B": 4,
    56.     "65B": 8,
    57. }
    58. def compute_intermediate_size(n):
    59.     return int(math.ceil(n * 8 / 3) + 255) // 256 * 256
    60. def read_json(path):
    61.     with open(path, "r") as f:
    62.         return json.load(f)
    63. def write_json(text, path):
    64.     with open(path, "w") as f:
    65.         json.dump(text, f)
    66. def write_model(model_path, input_base_path, model_size):
    67.     os.makedirs(model_path, exist_ok=True)
    68.     tmp_model_path = os.path.join(model_path, "tmp")
    69.     os.makedirs(tmp_model_path, exist_ok=True)
    70.     params = read_json(os.path.join(input_base_path, "params.json"))
    71.     num_shards = NUM_SHARDS[model_size]
    72.     n_layers = params["n_layers"]
    73.     n_heads = params["n_heads"]
    74.     n_heads_per_shard = n_heads // num_shards
    75.     dim = params["dim"]
    76.     dims_per_head = dim // n_heads
    77.     base = 10000.0
    78.     inv_freq = 1.0 / (base ** (torch.arange(0, dims_per_head, 2).float() / dims_per_head))
    79.     # permute for sliced rotary
    80.     def permute(w):
    81.         return w.view(n_heads, dim // n_heads // 2, 2, dim).transpose(1, 2).reshape(dim, dim)
    82.     print(f"Fetching all parameters from the checkpoint at {input_base_path}.")
    83.     # Load weights
    84.     if model_size == "7B":
    85.         # Not sharded
    86.         # (The sharded implementation would also work, but this is simpler.)
    87.         loaded = torch.load(os.path.join(input_base_path, "consolidated.00.pth"), map_location="cpu")
    88.     else:
    89.         # Sharded
    90.         loaded = [
    91.             torch.load(os.path.join(input_base_path, f"consolidated.{i:02d}.pth"), map_location="cpu")
    92.             for i in range(num_shards)
    93.         ]
    94.     param_count = 0
    95.     index_dict = {"weight_map": {}}
    96.     for layer_i in range(n_layers):
    97.         filename = f"pytorch_model-{layer_i + 1}-of-{n_layers + 1}.bin"
    98.         if model_size == "7B":
    99.             # Unsharded
    100.             state_dict = {
    101.                 f"model.layers.{layer_i}.self_attn.q_proj.weight": permute(
    102.                     loaded[f"layers.{layer_i}.attention.wq.weight"]
    103.                 ),
    104.                 f"model.layers.{layer_i}.self_attn.k_proj.weight": permute(
    105.                     loaded[f"layers.{layer_i}.attention.wk.weight"]
    106.                 ),
    107.                 f"model.layers.{layer_i}.self_attn.v_proj.weight": loaded[f"layers.{layer_i}.attention.wv.weight"],
    108.                 f"model.layers.{layer_i}.self_attn.o_proj.weight": loaded[f"layers.{layer_i}.attention.wo.weight"],
    109.                 f"model.layers.{layer_i}.mlp.gate_proj.weight": loaded[f"layers.{layer_i}.feed_forward.w1.weight"],
    110.                 f"model.layers.{layer_i}.mlp.down_proj.weight": loaded[f"layers.{layer_i}.feed_forward.w2.weight"],
    111.                 f"model.layers.{layer_i}.mlp.up_proj.weight": loaded[f"layers.{layer_i}.feed_forward.w3.weight"],
    112.                 f"model.layers.{layer_i}.input_layernorm.weight": loaded[f"layers.{layer_i}.attention_norm.weight"],
    113.                 f"model.layers.{layer_i}.post_attention_layernorm.weight": loaded[f"layers.{layer_i}.ffn_norm.weight"],
    114.             }
    115.         else:
    116.             # Sharded
    117.             # Note that in the 13B checkpoint, not cloning the two following weights will result in the checkpoint
    118.             # becoming 37GB instead of 26GB for some reason.
    119.             state_dict = {
    120.                 f"model.layers.{layer_i}.input_layernorm.weight": loaded[0][
    121.                     f"layers.{layer_i}.attention_norm.weight"
    122.                 ].clone(),
    123.                 f"model.layers.{layer_i}.post_attention_layernorm.weight": loaded[0][
    124.                     f"layers.{layer_i}.ffn_norm.weight"
    125.                 ].clone(),
    126.             }
    127.             state_dict[f"model.layers.{layer_i}.self_attn.q_proj.weight"] = permute(
    128.                 torch.cat(
    129.                     [
    130.                         loaded[i][f"layers.{layer_i}.attention.wq.weight"].view(n_heads_per_shard, dims_per_head, dim)
    131.                         for i in range(num_shards)
    132.                     ],
    133.                     dim=0,
    134.                 ).reshape(dim, dim)
    135.             )
    136.             state_dict[f"model.layers.{layer_i}.self_attn.k_proj.weight"] = permute(
    137.                 torch.cat(
    138.                     [
    139.                         loaded[i][f"layers.{layer_i}.attention.wk.weight"].view(n_heads_per_shard, dims_per_head, dim)
    140.                         for i in range(num_shards)
    141.                     ],
    142.                     dim=0,
    143.                 ).reshape(dim, dim)
    144.             )
    145.             state_dict[f"model.layers.{layer_i}.self_attn.v_proj.weight"] = torch.cat(
    146.                 [
    147.                     loaded[i][f"layers.{layer_i}.attention.wv.weight"].view(n_heads_per_shard, dims_per_head, dim)
    148.                     for i in range(num_shards)
    149.                 ],
    150.                 dim=0,
    151.             ).reshape(dim, dim)
    152.             state_dict[f"model.layers.{layer_i}.self_attn.o_proj.weight"] = torch.cat(
    153.                 [loaded[i][f"layers.{layer_i}.attention.wo.weight"] for i in range(num_shards)], dim=1
    154.             )
    155.             state_dict[f"model.layers.{layer_i}.mlp.gate_proj.weight"] = torch.cat(
    156.                 [loaded[i][f"layers.{layer_i}.feed_forward.w1.weight"] for i in range(num_shards)], dim=0
    157.             )
    158.             state_dict[f"model.layers.{layer_i}.mlp.down_proj.weight"] = torch.cat(
    159.                 [loaded[i][f"layers.{layer_i}.feed_forward.w2.weight"] for i in range(num_shards)], dim=1
    160.             )
    161.             state_dict[f"model.layers.{layer_i}.mlp.up_proj.weight"] = torch.cat(
    162.                 [loaded[i][f"layers.{layer_i}.feed_forward.w3.weight"] for i in range(num_shards)], dim=0
    163.             )
    164.         state_dict[f"model.layers.{layer_i}.self_attn.rotary_emb.inv_freq"] = inv_freq
    165.         for k, v in state_dict.items():
    166.             index_dict["weight_map"][k] = filename
    167.             param_count += v.numel()
    168.         torch.save(state_dict, os.path.join(tmp_model_path, filename))
    169.     filename = f"pytorch_model-{n_layers + 1}-of-{n_layers + 1}.bin"
    170.     if model_size == "7B":
    171.         # Unsharded
    172.         state_dict = {
    173.             "model.embed_tokens.weight": loaded["tok_embeddings.weight"],
    174.             "model.norm.weight": loaded["norm.weight"],
    175.             "lm_head.weight": loaded["output.weight"],
    176.         }
    177.     else:
    178.         state_dict = {
    179.             "model.norm.weight": loaded[0]["norm.weight"],
    180.             "model.embed_tokens.weight": torch.cat(
    181.                 [loaded[i]["tok_embeddings.weight"] for i in range(num_shards)], dim=1
    182.             ),
    183.             "lm_head.weight": torch.cat([loaded[i]["output.weight"] for i in range(num_shards)], dim=0),
    184.         }
    185.     for k, v in state_dict.items():
    186.         index_dict["weight_map"][k] = filename
    187.         param_count += v.numel()
    188.     torch.save(state_dict, os.path.join(tmp_model_path, filename))
    189.     # Write configs
    190.     index_dict["metadata"] = {"total_size": param_count * 2}
    191.     write_json(index_dict, os.path.join(tmp_model_path, "pytorch_model.bin.index.json"))
    192.     config = LlamaConfig(
    193.         hidden_size=dim,
    194.         intermediate_size=compute_intermediate_size(dim),
    195.         num_attention_heads=params["n_heads"],
    196.         num_hidden_layers=params["n_layers"],
    197.         rms_norm_eps=params["norm_eps"],
    198.     )
    199.     config.save_pretrained(tmp_model_path)
    200.     # Make space so we can load the model properly now.
    201.     del state_dict
    202.     del loaded
    203.     gc.collect()
    204.     print("Loading the checkpoint in a Llama model.")
    205.     model = LlamaForCausalLM.from_pretrained(tmp_model_path, torch_dtype=torch.float16, low_cpu_mem_usage=True)
    206.     # Avoid saving this as part of the config.
    207.     del model.config._name_or_path
    208.     print("Saving in the Transformers format.")
    209.     model.save_pretrained(model_path)
    210.     shutil.rmtree(tmp_model_path)
    211. def write_tokenizer(tokenizer_path, input_tokenizer_path):
    212.     # Initialize the tokenizer based on the `spm` model
    213.     tokenizer_class = LlamaTokenizer if LlamaTokenizerFast is None else LlamaTokenizerFast
    214.     print(f"Saving a {tokenizer_class.__name__} to {tokenizer_path}.")
    215.     tokenizer = tokenizer_class(input_tokenizer_path)
    216.     tokenizer.save_pretrained(tokenizer_path)
    217. def main():
    218.     parser = argparse.ArgumentParser()
    219.     parser.add_argument(
    220.         "--input_dir",
    221.         help="Location of LLaMA weights, which contains tokenizer.model and model folders",
    222.     )
    223.     parser.add_argument(
    224.         "--model_size",
    225.         choices=["7B", "13B", "30B", "65B", "tokenizer_only"],
    226.     )
    227.     parser.add_argument(
    228.         "--output_dir",
    229.         help="Location to write HF model and tokenizer",
    230.     )
    231.     args = parser.parse_args()
    232.     if args.model_size != "tokenizer_only":
    233.         write_model(
    234.             model_path=args.output_dir,
    235.             input_base_path=os.path.join(args.input_dir, args.model_size),
    236.             model_size=args.model_size,
    237.         )
    238.     spm_path = os.path.join(args.input_dir, "tokenizer.model")
    239.     write_tokenizer(args.output_dir, spm_path)
    240. if __name__ == "__main__":
    241.     main()
    执行格式转换命令
    python convert_llama_weights_to_hf.py --input_dir path_to_original_llama_root_dir --model_size 7B --output_dir path_to_original_llama_hf_dir

    注意:这一步有点久(很长时间)

    会报的错:

    25ddfbf67dc5474003dc93fab4fd6c19.png

    55f7f4a45eb03ac7c64bcf796a31fb35.png

    会在目录中生成一个新目录:path_to_original_llama_hf_dir

    6993c10ca8c81d2fb472f4c17bc9f624.png

    执行模型合并命令

    下载以下文件到llama目录

    📎merge_llama_with_chinese_lora.py

    356943d83aa2e97127c5cca47ff84fc5.png

    执行合并模型命令
    python merge_llama_with_chinese_lora.py --base_model path_to_original_llama_hf_dir --lora_model chinese-alpaca-lora-7b --output_dir path_to_output_dir

    955d882e0cb3c8602230aabafe1fd684.png

    8ee499990e2a37d3a131cb5de982315d.png

    会生成一个目录:path_to_output_dir

    69b3599917b6fc20a764d514280be63d.png

    c68bc9778edc9f2a06545016021ffe2a.png

    下载模型

    在llama目录下载代码如下:

    git clone  http://github.com/ggerganov/llama.cpp

    遇到报错

    ebbb606953ef0cf60f9fc017bfee1dc3.png

    解决办法执行命令

    git config --global --unset http.proxy

    编译模型&转换格式

    编译文件

    注意:由于前端我是用powershell方式进行安装所以用第一种方式

    1. #进入 llama.app
    2. cd llama.app
    3. #通过powershell安装的mingw进行编译
    4. cmake . -G "MinGW Makefiles"
    5. #进行构建
    6. cmake --build . --config Release

    5fecc65fff4394fe6d6e001a8b777462.png

    19e02665213be0f4094811b85d966a5f.png

    1. #进入 llama.app
    2. cd llama.app
    3. #创建 build文件夹
    4. mkdir build
    5. #进入build
    6. cd build
    7. #编译
    8. cmake ..
    9. #构建
    10. cmake --build . --config Release

    移动文件配置

    在 llama.app 目录中新建目录 zh-models

    将path_to_output_dir文件夹内的consolidated.00.pth和params.json文件放入上面格式中的位置

    将path_to_output_dir文件夹内的tokenizer.model文件放在跟7B文件夹同级的位置

    最终如下:

    d8f3960a95d65f4930cdbb4d7527958e.png

    12cee1695bcffc9d5268b11ea1725f3c.png

    41186473133091e98ba5d2400fb2091d.png

    转换格式

    注意:到 llama.cpp 目录

    将 .pth模型权重转换为ggml的FP16格式

    生成文件路径为zh-models/7B/ggml-model-f16.bin,执行命令如下:

    python convert-pth-to-ggml.py zh-models/7B/ 1

    29384c599b46f23077083e17a7c67eed.png

    生成结果

    3eaa1f918af088a01cf8173fb605fb57.png

    9e47c3b4b5dfb5f08b0d1681b2fae7a2.png

    对FP16模型进行4-bit量化

    执行命令:

    D:\ai\llama\llama.cpp\bin\quantize.exe ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_0.bin 2

    生成量化模型文件路径为zh-models/7B/ggml-model-q4_0.bin

    5fcbf152884db43a3793d5697a5002d3.png

    14bcbdff96ced1bf26b8865f04732847.png

    运行模型

    1. cd D:\ai\llama\llama.cpp
    2. D:\ai\llama\llama.cpp\bin\main.exe -m zh-models/7B/ggml-model-q4_0.bin --color -f prompts/alpaca.txt -ins -c 2048 --temp 0.2 -n 256 --repeat_penalty 1.3

    结果

    da15a323aac198dc8040323377b3a855.png

    最后

         我知道很多同学可能觉得学习大模型需要懂python有一定的难度,当然我是建议先学习好一个语言后再去学习其它语言,其实按照我过来的经验,我觉得python或java都好,语言语法都差不多,只是一个工具只是看我们要不要用。毕竟有java后端的基础再去学python,本人两周基本就上手了。当然还是建议有一个主线,再展开,而不是出什么学什么,真没必要。但是对于技术来说要看价值及发展,有可能现在很流行的技术半年或几年后就过了。当然也不是完全说固步自封,一切看自身条件(阶段、能力、意愿、时间等)、社会发展、价值等。

     参考文章:

        https://zhuanlan.zhihu.com/p/617952293

        https://zhuanlan.zhihu.com/p/632102048?utm_id=0

        https://www.bilibili.com/read/cv24984542/

  • 相关阅读:
    Java第5章 抽象类与接口
    异地局域网对接:异地组网原理与实操
    Mysql 数据库开发简介与选择
    微服务框架入门(springcloud)
    【 java 常用类】你不知道的String
    TikTok的媒体革命:新闻业如何适应短视频时代?
    Python语义分割与街景识别(3):数据集准备
    重组件的优化和页面渲染十万条数据
    C++ Reference: Standard C++ Library reference: C Library: cwchar: wctob
    【 STM32Fxx串口问题-发送时间-发送字节拉长-每一位与下一位也拉长->>记录问题以及解决方式-复盘】
  • 原文地址:https://blog.csdn.net/qq_16498553/article/details/132798058