From 322c170566367bdef85c28cb7ffef3368a17ab42 Mon Sep 17 00:00:00 2001 From: oobabooga <112222186+oobabooga@users.noreply.github.com> Date: Tue, 7 Nov 2023 14:45:11 -0800 Subject: [PATCH] Document logits_all --- README.md | 1 + docs/04 ‐ Model Tab.md | 4 ++++ 2 files changed, 5 insertions(+) diff --git a/README.md b/README.md index 95b9a12b..61055db4 100644 --- a/README.md +++ b/README.md @@ -327,6 +327,7 @@ Optionally, you can use the following command-line flags: | `--tensor_split TENSOR_SPLIT` | Split the model across multiple GPUs. Comma-separated list of proportions. Example: 18,17. | | `--llama_cpp_seed SEED` | Seed for llama-cpp models. Default is 0 (random). | | `--numa` | Activate NUMA task allocation for llama.cpp. | +| `--logits_all`| Needs to be set for perplexity evaluation to work. Otherwise, ignore it, as it makes prompt processing slower. | | `--cache-capacity CACHE_CAPACITY` | Maximum cache capacity (llama-cpp-python). Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed. | #### ExLlama diff --git a/docs/04 ‐ Model Tab.md b/docs/04 ‐ Model Tab.md index 20744c5f..d21b74d8 100644 --- a/docs/04 ‐ Model Tab.md +++ b/docs/04 ‐ Model Tab.md @@ -110,6 +110,10 @@ To use it, you need to download a tokenizer. There are two options: 1) Download `oobabooga/llama-tokenizer` under "Download model or LoRA". That's a default Llama tokenizer. 2) Place your .gguf in a subfolder of `models/` along with these 3 files: `tokenizer.model`, `tokenizer_config.json`, and `special_tokens_map.json`. This takes precedence over Option 1. +It has an additional parameter: + +* **logits_all**: Needs to be checked if you want to evaluate the perplexity of the llama.cpp model using the "Training" > "Perplexity evaluation" tab. Otherwise, leave it unchecked, as it makes prompt processing slower. + ### ctransformers Loads: GGUF/GGML models.