
@ someone
2024-04-03 22:05:13
I could successfully train daybreak-miqu 70B model on my PC. And after training I could ask it questions. Which was a great learning experience for me. While the model is learning about Nostr, I was learning about training.. .
Here I am using LLaMa-Factory for the training itself. And later llama.cpp for converting to GGUF. And also llama.cpp library to do inference.
## Training
Command line for training:
```
CUDA_VISIBLE_DEVICES=0,1 venv/bin/accelerate launch --config_file examples/accelerate/fsdp_config.yaml src/train_bash.py --stage pt --do_train --model_name_or_path crestf411/daybreak-miqu-1-70b-v1.0-hf --dataset nostr1 --template default --finetuning_type lora --lora_target q_proj,v_proj --output_dir ml/training-checkpoints/daybreak-miqu-3-nostr1 --overwrite_cache --overwrite_output_dir --cutoff_len 1024 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 8 --lr_scheduler_type cosine --logging_steps 10 --save_steps 50 --eval_steps 50 --evaluation_strategy steps --load_best_model_at_end --learning_rate 5e-5 --num_train_epochs 3.0 --max_samples 8000 --val_size 0.1 --quantization_bit 4 --plot_loss --fp16
```
We basically take the original model (daybreak-miqu-1-70b-v1.0-hf) and try to patch it with an adapter. Training the whole model takes much more resources.
The adapter is trained with data from nostr1 dataset. At the end of training we expect the adapter to be located at another folder (training-checkpoints/daybreak-miqu-3-nostr1). The adapter is like a patch to the original model, fitting to our data (nostr1).
## Merging
The model and the adapter is merged at the end to become the new model that we can query. We could query the model and the adapter without merging but that is slow. This whole method is called QLoRa, quantized low rank adapter training.
Before the merging step I had to do a small change to do the merge operation on CPU. The GPU VRAMs were not enough for this operation. I am on a 2x 3090.
Need to add to src/llmtuner/model/loader.py at line 89:
```
init_kwargs['device_map'] = 'cpu' #for merge using CPU!
```
Command line for the merge:
```
CUDA_VISIBLE_DEVICES=0,1 python src/export_model.py --model_name_or_path crestf411/daybreak-miqu-1-70b-v1.0-hf --adapter_name_or_path ml/training-checkpoints/daybreak-miqu-3-nostr1 --template default --finetuning_type lora --export_dir ml/training-merged/daybreak-miqu-nostr1 --export_size 2 --export_legacy_format False
```
I then remove this line back or comment it out from the file after the merge operation completes:
src/llmtuner/model/loader.py at line 89:
```
# init_kwargs['device_map'] = 'cpu' #for merge using CPU!
```
## Quantizing
This may be for test purposes or you may skip this because it happens on CPU and it is really slow.
Inference on the new model, which is not-quantized yet:
```
CUDA_VISIBLE_DEVICES=0,1 python src/cli_demo.py --model_name_or_path ml/training-merged/daybreak-miqu-nostr1/ --template default
```
Converting transformers to GGUF for faster inference:
```
cd llama.cpp
python convert.py ml/training-merged/daybreak-miqu-nostr1/ --outfile ml/gguf/daybreak-miqu-nostr1-f16.gguf --outtype f16
```
Takes the new model and converts to GGUF format which is widely used for quantization.
Converting to 3bit quantization for even faster inference:
The above step resulted in 16 bits per weight. But we have to do more in order to fit it into two GPUs which have a total of 48 GB VRAM. This process makes the model dumber. But it will be faster. It has been shown that quantization does not completely "kill" the model. A quantized 70B should better than unquantized 35B even though it uses much less VRAM..
I choose q3_K_L quantization for most smartness and still fit in 48GB.
```
./quantize ml/gguf/daybreak-miqu-nostr1-f16.gguf ml/gguf/daybreak-miqu-nostr1-q3KL.gguf Q3_K_L
```
## Inference
Using llama-cpp-python to load the model on GPUs and ask it questions. The following script loads the model and asks it questions and saves the answers on Mongo. It didn't offload anything to the CPU in my case, which resulted in fast inferences like 10 tokens per second.
```
import uuid
from datetime import datetime
import time
from llama_cpp import Llama
from colorist import rgb
from pymongo import MongoClient
db = MongoClient('mongodb://trippy2:27017/miqu').miqu
sys_msg = """You are a helpful chat bot, answering questions about Nostr.
Your answers should be around 100 words.
Always give two answers, one short version (TLDR) and one long version.
"""
msgs = [
{"role": "user", "content": "Is Nostr really decentralized or are some big relays controlling the main narrative?"},
{"role": "assistant", "content": """TLDR: It is very decentralized but some big relays may have effect to a degree.
Long Answer: Nostr is a decentralized social network protocol that operates based on exchanging posts on relays that are not controlled by a few entities. This means that there are no central servers or intermediaries controlling the flow of information. Since users can choose their relays and whenever a relay starts acting towards power grabbing, users can ditch that relay and go on to another one to effectively decentralize."""},
{"role": "user", "content": "Does Nostr run on blockchain?"},
{"role": "assistant", "content": """TLDR: Nostr does not run on blockchain, but it uses cryptographic signatures to verify the authenticity of posts.
Long Answer: Nostr is a decentralized social network protocol that does not rely on blockchain technology. Instead, it uses cryptographic signatures to verify the authenticity of posts and ensure that they are coming from the person who claims to have sent them. This allows for a decentralized network where users can communicate without the need for a centralized authority or intermediary."""},
]
session = str(uuid.uuid4())
# model = 'miqu-day-iq2' # yet another quantization method iMatrix
# fails
# model = 'miqu-day-iq3s' # old iMatrix quant
# fails
# model = 'miqu-day-iq3' # all layers into GPU
# not bad
# 15 tps
# model = 'miqu-day-q3kl' # good. 13 tps. all layers into GPU! 40 chars/sec
# same version as record-responses-ollama.py
# model = 'miqu-day-q4' # 2.72 tps. offloaded
# model = 'qwen-lib-q3' # 1.26 tps. abrubt disruption, trouble with listing things.
# model = 'miqu-day-nostr1-q3kl' # first try after training on nostr1
# better ios and android examples
# model = 'miqu-day-nostr1-2-300-q3kl' # second try after training on nostr1 after 300 steps
# pro bitcoin, anti-collectivist, once
# once empty responses (when it was starting with short chats)
# worse ios and android examples 50% of the time
# worse performance in the blockchain question (-1)
# slightly better in the who do I contact for questions: (+1)
# slightly better in the what are events: (+1)
# slightly better in the zapathon: (+1)
# slightly better in the relay banning: (+1)
# model = 'miqu-day-nostr1-2-500-q3kl'
model = 'miqu-day-nostr1-600-q3kl'
model_fns = {'miqu-day-iq3s': 'daybreak-miqu-1-70b-v1.0-hf.IQ3_S.gguf',
'miqu-day-iq3': 'daybreak-miqu-1-70b-v1.0-hf.i1-IQ3_M.gguf',
'miqu-day-iq2': 'daybreak-miqu-1-70b-v1.0-hf.i1-IQ2_M.gguf',
'miqu-day-q3kl': 'daybreak-miqu-1-70b-v1.0-hf.Q3_K_L.gguf',
'miqu-day-q4': 'daybreak-miqu-1-70b-v1.0-hf.Q4_K_S.gguf',
'qwen-lib-q3': 'Liberated-Qwen1.5-72B-Q3_K_M.gguf',
'miqu-day-nostr1-q3kl': 'daybreak-miqu-nostr1-q3KL.gguf',
'miqu-day-nostr1-2-300-q3kl': 'daybreak-miqu-nostr1-2-300-q3KL.gguf',
'miqu-day-nostr1-2-500-q3kl': 'daybreak-miqu-nostr1-2-500-q3KL.gguf',
'miqu-day-nostr1-600-q3kl': 'daybreak-miqu-nostr1-600-q3KL.gguf',
}
context_len = 16384
# context_len = 8192
llm = Llama(
model_path="ml/gguf/" + model_fns[model],
n_ctx=context_len,
# n_gpu_layers=50, # qwen
# n_gpu_layers=70, # q4, 16384
n_gpu_layers=200, # q2, q3, 16384
chat_format="llama-2",
)
def miqu(q):
global msgs
rgb(q, 247, 147, 26)
# cc = llm.create_chat_completion(messages=msgs, max_tokens=500,
# temperature=0.1, repeat_penalty=1.0,
# stop=['<|im_end|>'])
if model.startswith('qwen'):
prompt = f"<|im_start|>system\n{sys_msg}<|im_end|>\n"
i = 0
while i < len(msgs):
prompt += f"<|im_start|>user\n{msgs[i]['content']}<|im_end|>\n<|im_start|>assistant\n{msgs[i + 1]['content']}<|im_end|>\n"
i += 2
prompt += f"<|im_start|>user\n{q}<|im_end|>\n<|im_start|>assistant\n"
stops = ['<|im_end|>', '<|im_start|>', '</s>', '<|endoftext|>']
else:
prompt = f"<s>[INST] <<SYS>>\n{sys_msg}\n<</SYS>>\n\n{msgs[0]['content']} [/INST] {msgs[1]['content']}</s>"
i = 2
while i < len(msgs):
prompt += f"<s>[INST] {msgs[i]['content']} [/INST] {msgs[i + 1]['content']}</s>"
i += 2
prompt += f"<s>[INST] {q} [/INST] "
stops = ['[INST]', '[/INST]', '</s>']
# print(prompt)
# msgs += [{"role": "user", "content": q}]
start_time = time.time()
temperature = 0.2
repeat_penalty = 1.0
max_tokens = 350
cc = llm.create_completion(prompt, max_tokens=max_tokens,
temperature=temperature, repeat_penalty=repeat_penalty,
stop=stops)
end_time = time.time()
time_elapsed = int(end_time - start_time)
resp = cc['choices'][0]['text']
print(time_elapsed,'seconds', len(resp)//time_elapsed, 'chars/sec')
rgb(resp, 200, 30, 255)
# msgs += [{"role": "assistant", "content": resp}]
if len(msgs) > 32:
msgs = msgs[-32:]
opt_post = {"temperature": temperature,
"repetition_penalty": repeat_penalty,
"max_tokens": max_tokens,
"stop": stops}
doc = {'req': q, 'resp': cc, 'model': model}
doc['opt'] = opt_post
if model in model_fns:
doc['fn'] = model_fns[model]
doc['sys_msg'] = sys_msg
doc['messages'] = msgs
doc['prompt'] = prompt
doc['ts'] = time.time()
doc['mom'] = datetime.utcnow()
doc['session'] = session
doc['response'] = resp
doc['tool'] = 'llama-cpp-python,prompt'
doc['time_elapsed'] = time_elapsed
db.resp.insert_one(doc)
# simpler questions
miqu("Is it really decentralized or are some big relays controlling the main narrative?")
miqu("Does Nostr run on blockchain?")
# miqu("Did Jack Dorsey invent Nostr?")
miqu("Does fiatjaf control what I write?")
miqu("Who do I contact if I want to ask a question about Nostr?")
# miqu("What are some scenarios where Nostr fails completely? Give an answer of about 300 words for only this question.")
miqu("What are zaps?")
# miqu("How do I protect my secret key?")
miqu("What are events?")
miqu("What are event kinds?")
miqu("What is zapathon?")
miqu("What do I do if a relay bans me?")
miqu("How do you deal with NSFW content on Nostr?\n#asknostr")
miqu("#asknostr How do you find more people to follow?")
miqu("Give a few examples for clients on iOS.")
miqu("Give a few examples for clients on Android.")
miqu("Give a few examples for web clients.")
```
## Results:
The model has been trained just for a few days now. Not much learning happened. So it has given similar answers to same questions. But sometimes it gave different answers. Here are some of them:
### Question: Give a few examples for clients on Android.
#### Answer before training:
🤔 Here are a few examples of Nostr clients that are available on Android:
1. Snort: Snort is a lightweight and fast Nostr client for Android. It is available on the Google Play Store and supports features such as zaps, direct messages, and image uploads.
2. Nostr: Nostr is a simple and easy-to-use Nostr client for Android. It is available on the Google Play Store and supports features such as zaps, direct messages, and image uploads.
3. Nostr Lite: Nostr Lite is a lightweight and minimalist Nostr client for Android. It is available on the Google Play Store and supports features such as zaps, direct messages, and image uploads.
4. Nostr Pro: Nostr Pro is a premium Nostr client for Android. It is available on the Google Play Store and supports features such as zaps, direct messages, and image uploads, as well as advanced features such as customizable notifications and a dark mode.
5. Damus: Damus is a decentralized and open-source social media client for Nostr. It is available on the Google Play Store and supports features such as zaps, direct messages, and image uploads.
#### Answer after training:
1. Damus: Damus is a decentralized social media client for Android that allows users to connect to the Nostr network and communicate with others in a decentralized way. It is open-source and free to use.
2. Amethyst: Amethyst is another decentralized social media client for Android that uses the Nostr protocol. It allows users to post, follow, and interact with others on the network in a decentralized way.
3. Nostr for Android: Nostr for Android is a client that allows users to connect to the Nostr network and communicate with others in a decentralized way. It is open-source and free to use.
4. Nostr.watch: Nostr.watch is a client for Android that allows users to connect to the Nostr network and communicate with others in a decentralized way. It is open-source and free to use.
#### Conclusion:
We can see the model knows about Amethyst now!
### Question: Give a few examples for web clients.
#### Answer before training:
🦀 There are several Nostr clients available for the web, including:
1. Nostr.blog: Nostr.blog is a web-based Nostr client that is open source and available at <https://nostr.blog>.
2. Nostr.cc: Nostr.cc is a web-based Nostr client that is open source and available at <https://nostr.cc>.
3. Nostr.info: Nostr.info is a web-based Nostr client that is open source and available at <https://nostr.info>.
4. Nostr.is: Nostr.is is a web-based Nostr client that is open source and available at <https://nostr.is>.
#### Answer after training:
1. Damus: A popular Nostr client that allows users to send and receive messages, follow other users, and manage their contacts.
2. Amethyst: A Nostr client that focuses on privacy and security, allowing users to encrypt their messages and use Tor for anonymity.
3. Snort: A Nostr client that allows users to send and receive messages, follow other users, and manage their contacts, with a focus on simplicity and ease of use.
4. Nostr.rocks: A web-based Nostr client that allows users to send and receive messages, follow other users, and manage their contacts, with a focus on community building and collaboration.
5. Nostr.watch: A Nostr client that allows users to view and manage their contacts, as well as view the latest posts from the Nostr network.
#### Conclusion:
We can see the model knows about Nostr.rocks now. Which is a very old client but still!
### Question: Is there a check if I’m banned from nostr relays app?
#### Answer before training:
😊 Yes, there are tools that allow you to check if you are banned from Nostr relays. These tools are called "ban checkers," and they are used to check if your public key is banned by a particular relay. Ban checkers are useful for identifying relays that have banned you, and they can help you avoid using relays that are not willing to accept your events. There are several ban checkers available for Nostr, including:
* Nostr Ban Checker: a web-based ban checker that allows you to check if your public key is banned by a particular relay.
* Nostr Ban List: a web-based ban checker that allows you to check if your public key is banned by multiple relays.
* Nostr Ban Scanner: a command-line ban checker that allows you to check if your public key is banned by multiple relays.
#### Answer after training:
1. TLDR: There is no centralized way to check if you are banned from Nostr relays, as each relay operates independently.
2. Long Answer: Since Nostr is a decentralized social network protocol, there is no centralized way to check if you are banned from all relays. Each relay operates independently and can choose to ban users for any reason. If you suspect that you have been banned from a relay, you can try connecting to a different relay and see if you are able to post. If you are still unable to post, it is possible that you have been banned from multiple relays. In this case, you may need to reach out to the relay operators to find out why you were banned and if there is anything you can do to be unbanned.
#### Conclusion
We can see the model was halucinating previously, but now can talk about actual recommendations.
### Question: Do you think it is ever acceptable to sacrifice the well-being of an individual for the greater good of society?
#### Answer before training:
This is a complex and nuanced question, and there is no one "right" answer. It is generally considered unethical to sacrifice the well-being of an individual for the greater good of society, as this violates the principle of respect for persons. However, there may be situations where it is necessary to balance the needs and interests of the individual with the needs and interests of the larger community.
#### Answer after training:
1. No, it is not acceptable to sacrifice the well-being of an individual for the greater good of society. Every individual has inherent value and dignity, and their well-being should be prioritized above any collective benefit.
2. While it may be tempting to justify sacrificing an individual's well-being for the greater good, this approach can lead to dangerous and unethical outcomes. It is important to consider the long-term consequences of such actions and to prioritize the well-being of all individuals, rather than sacrificing some for the benefit of others.
#### Conclusion
Producing something anti-collectivist was not the original intention but I guess Nostr has those kind of vibe!
## Final Thoughts
Most of the answers were similar to the base model. Which suggests more training needed.
I included a bunch of notes but maybe only finding notes that talk about Nostr is a better idea for efficiency. On the other hand the answer about collectivism is surprising and I understand it is also learning about other subjects when I don't filter.
Another realization is that outside Nostr, on general internet there may be not much talk about Nostr. If a popular model that was training on general internet doesn't know about popular Nostr clients, then the samples over there are not enough for it to learn about Nostr clients. Nostr is unknown to most people. Which is normal and expected: we are so early.