Update ExLlama.md (#2729)

Add details for configuring exllama
2024-09-20 10:35:10 +02:00 · 2023-06-16 21:46:25 -05:00 · 2023-06-16 21:46:25 -05:00 · a1ca1c04a1
commit a1ca1c04a1
parent b27f83c0e9
1 changed files with 5 additions and 1 deletions
--- a/docs/ExLlama.md
+++ b/docs/ExLlama.md
@ -2,7 +2,7 @@

 ## About

-ExLlama is an extremely optimized GPTQ backend for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.
+ExLlama is an extremely optimized GPTQ backend ("loader") for LLaMA models. It features much lower VRAM usage and much higher speeds due to not relying on unoptimized transformers code.

 ## Installation:

@ -15,3 +15,7 @@ git clone https://github.com/turboderp/exllama
 ```

 2) Follow the remaining set up instructions in the official README: https://github.com/turboderp/exllama#exllama
+
+3) Configure text-generation-webui to use exllama via the UI or command line:
+   - In the "Model" tab, set "Loader" to "exllama"
+   - Specify `--loader exllama` on the command line