Commit graph

303 commits

Author SHA1 Message Date
Ayanami Rei
345b6dee8c refactor quant models loader and add support of OPT 2023-03-13 19:59:57 +03:00
oobabooga
66b6971b61 Update README 2023-03-13 12:44:18 -03:00
oobabooga
ddea518e0f Document --auto-launch 2023-03-13 12:43:33 -03:00
oobabooga
372363bc3d Fix GPTQ load_quant call on Windows 2023-03-13 12:07:02 -03:00
oobabooga
0c224cf4f4 Fix GALACTICA (#285) 2023-03-13 10:32:28 -03:00
oobabooga
2c4699a7e9 Change a comment 2023-03-13 00:20:02 -03:00
oobabooga
0a7acb3bd9 Remove redundant comments 2023-03-13 00:12:21 -03:00
oobabooga
77294b27dd Use str(Path) instead of os.path.abspath(Path) 2023-03-13 00:08:01 -03:00
oobabooga
b9e0712b92 Fix Open Assistant 2023-03-12 23:58:25 -03:00
oobabooga
1ddcd4d0ba Clean up silero_tts
This should only be used with --no-stream.

The shared.still_streaming implementation was faulty by design:
output_modifier should never be called when streaming is already over.
2023-03-12 23:42:49 -03:00
HideLord
683556f411 Adding markdown support and slight refactoring. 2023-03-12 21:34:09 +02:00
oobabooga
cebe8b390d Remove useless "substring_found" variable 2023-03-12 15:50:38 -03:00
oobabooga
4bcd675ccd Add *Is typing...* to regenerate as well 2023-03-12 15:23:33 -03:00
oobabooga
c7aa51faa6 Use a list of eos_tokens instead of just a number
This might be the cause of LLaMA ramblings that some people have experienced.
2023-03-12 14:54:58 -03:00
oobabooga
d8bea766d7
Merge pull request #192 from xanthousm/main
Add text generation stream status to shared module, use for better TTS with auto-play
2023-03-12 13:40:16 -03:00
oobabooga
fda376d9c3 Use os.path.abspath() instead of str() 2023-03-12 12:41:04 -03:00
HideLord
8403152257 Fixing compatibility with GPTQ repo commit 2f667f7da051967566a5fb0546f8614bcd3a1ccd. Expects string and breaks on 2023-03-12 17:28:15 +02:00
oobabooga
f3b00dd165
Merge pull request #224 from ItsLogic/llama-bits
Allow users to load 2, 3 and 4 bit llama models
2023-03-12 11:23:50 -03:00
oobabooga
65dda28c9d Rename --llama-bits to --gptq-bits 2023-03-12 11:19:07 -03:00
oobabooga
fed3617f07 Move LLaMA 4-bit into a separate file 2023-03-12 11:12:34 -03:00
oobabooga
0ac562bdba Add a default prompt for OpenAssistant oasst-sft-1-pythia-12b #253 2023-03-12 10:46:16 -03:00
oobabooga
78901d522b Remove unused imports 2023-03-12 08:59:05 -03:00
Xan
b3e10e47c0 Fix merge conflict in text_generation
- Need to update `shared.still_streaming = False` before the final `yield formatted_outputs`, shifted the position of some yields.
2023-03-12 18:56:35 +11:00
oobabooga
ad14f0e499 Fix regenerate (provisory way) 2023-03-12 03:42:29 -03:00
oobabooga
6e12068ba2
Merge pull request #258 from lxe/lxe/utf8
Load and save character files and chat history in UTF-8
2023-03-12 03:28:49 -03:00
oobabooga
e2da6b9685 Fix You You You appearing in chat mode 2023-03-12 03:25:56 -03:00
oobabooga
bcf0075278
Merge pull request #235 from xanthousm/Quality_of_life-main
--auto-launch and "Is typing..."
2023-03-12 03:12:56 -03:00
Aleksey Smolenchuk
3f7c3d6559
No need to set encoding on binary read 2023-03-11 22:10:57 -08:00
oobabooga
341e135036 Various fixes in chat mode 2023-03-12 02:53:08 -03:00
Aleksey Smolenchuk
3baf5fc700
Load and save chat history in utf-8 2023-03-11 21:40:01 -08:00
oobabooga
b0e8cb8c88 Various fixes in chat mode 2023-03-12 02:31:45 -03:00
unknown
433f6350bc Load and save character files in UTF-8 2023-03-11 21:23:05 -08:00
oobabooga
0bd5430988 Use 'with' statement to better handle streaming memory 2023-03-12 02:04:28 -03:00
oobabooga
37f0166b2d Fix memory leak in new streaming (second attempt) 2023-03-11 23:14:49 -03:00
oobabooga
92fe947721 Merge branch 'main' into new-streaming 2023-03-11 19:59:45 -03:00
oobabooga
2743dd736a Add *Is typing...* to impersonate as well 2023-03-11 10:50:18 -03:00
Xan
96c51973f9 --auto-launch and "Is typing..."
- Added `--auto-launch` arg to open web UI in the default browser when ready.
- Changed chat.py to display user input immediately and "*Is typing...*" as a temporary reply while generating text. Most noticeable when using `--no-stream`.
2023-03-11 22:50:59 +11:00
Xan
33df4bd91f Merge remote-tracking branch 'upstream/main' 2023-03-11 22:40:47 +11:00
draff
28fd4fc970 Change wording to be consistent with other args 2023-03-10 23:34:13 +00:00
draff
001e638b47 Make it actually work 2023-03-10 23:28:19 +00:00
draff
804486214b Re-implement --load-in-4bit and update --llama-bits arg description 2023-03-10 23:21:01 +00:00
ItsLogic
9ba8156a70
remove unnecessary Path() 2023-03-10 22:33:58 +00:00
draff
e6c631aea4 Replace --load-in-4bit with --llama-bits
Replaces --load-in-4bit with a more flexible --llama-bits arg to allow for 2 and 3 bit models as well. This commit also fixes a loading issue with .pt files which are not in the root of the models folder
2023-03-10 21:36:45 +00:00
oobabooga
026d60bd34 Remove default preset that didn't do anything 2023-03-10 14:01:02 -03:00
oobabooga
e9dbdafb14
Merge branch 'main' into pt-path-changes 2023-03-10 11:03:42 -03:00
oobabooga
706a03b2cb Minor changes 2023-03-10 11:02:25 -03:00
oobabooga
de7dd8b6aa Add comments 2023-03-10 10:54:08 -03:00
oobabooga
e461c0b7a0 Move the import to the top 2023-03-10 10:51:12 -03:00
deepdiffuser
9fbd60bf22 add no_split_module_classes to prevent tensor split error 2023-03-10 05:30:47 -08:00
deepdiffuser
ab47044459 add multi-gpu support for 4bit gptq LLaMA 2023-03-10 04:52:45 -08:00
rohvani
2ac2913747 fix reference issue 2023-03-09 20:13:23 -08:00
rohvani
826e297b0e add llama-65b-4bit support & multiple pt paths 2023-03-09 18:31:32 -08:00
oobabooga
9849aac0f1 Don't show .pt models in the list 2023-03-09 21:54:50 -03:00
oobabooga
74102d5ee4 Insert to the path instead of appending 2023-03-09 20:51:22 -03:00
oobabooga
2965aa1625 Check if the .pt file exists 2023-03-09 20:48:51 -03:00
oobabooga
828a524f9a Add LLaMA 4-bit support 2023-03-09 15:50:26 -03:00
oobabooga
59b5f7a4b7 Improve usage of stopping_criteria 2023-03-08 12:13:40 -03:00
oobabooga
add9330e5e Bug fixes 2023-03-08 11:26:29 -03:00
Xan
5648a41a27 Merge branch 'main' of https://github.com/xanthousm/text-generation-webui 2023-03-08 22:08:54 +11:00
Xan
ad6b699503 Better TTS with autoplay
- Adds "still_streaming" to shared module for extensions to know if generation is complete
- Changed TTS extension with new options:
   - Show text under the audio widget
   - Automatically play the audio once text generation finishes
   - manage the generated wav files (only keep files for finished generations, optional max file limit)
   - [wip] ability to change voice pitch and speed
- added 'tensorboard' to requirements, since python sent "tensorboard not found" errors after a fresh installation.
2023-03-08 22:02:17 +11:00
oobabooga
33fb6aed74 Minor bug fix 2023-03-08 03:08:16 -03:00
oobabooga
ad2970374a Readability improvements 2023-03-08 03:00:06 -03:00
oobabooga
72d539dbff Better separate the FlexGen case 2023-03-08 02:54:47 -03:00
oobabooga
0e16c0bacb Remove redeclaration of a function 2023-03-08 02:50:49 -03:00
oobabooga
ab50f80542 New text streaming method (much faster) 2023-03-08 02:46:35 -03:00
oobabooga
8e89bc596b Fix encode() for RWKV 2023-03-07 23:15:46 -03:00
oobabooga
19a34941ed Add proper streaming to RWKV 2023-03-07 18:17:56 -03:00
oobabooga
8660227e1b Add top_k to RWKV 2023-03-07 17:24:28 -03:00
oobabooga
153dfeb4dd Add --rwkv-cuda-on parameter, bump rwkv version 2023-03-06 20:12:54 -03:00
oobabooga
6904a507c6 Change some parameters 2023-03-06 16:29:43 -03:00
oobabooga
20bd645f6a Fix bug in multigpu setups (attempt 3) 2023-03-06 15:58:18 -03:00
oobabooga
09a7c36e1b Minor improvement while running custom models 2023-03-06 15:36:35 -03:00
oobabooga
24c4c20391 Fix bug in multigpu setups (attempt #2) 2023-03-06 15:23:29 -03:00
oobabooga
d88b7836c6 Fix bug in multigpu setups 2023-03-06 14:58:30 -03:00
oobabooga
5bed607b77 Increase repetition frequency/penalty for RWKV 2023-03-06 14:25:48 -03:00
oobabooga
bf56b6c1fb Load settings.json without the need for --settings settings.json
This is for setting UI defaults
2023-03-06 10:57:45 -03:00
oobabooga
e91f4bc25a Add RWKV tokenizer 2023-03-06 08:45:49 -03:00
oobabooga
c855b828fe Better handle <USER> 2023-03-05 17:01:47 -03:00
oobabooga
2af66a4d4c Fix <USER> in pygmalion replies 2023-03-05 16:08:50 -03:00
oobabooga
a54b91af77 Improve readability 2023-03-05 10:21:15 -03:00
oobabooga
8e706df20e Fix a memory leak when text streaming is on 2023-03-05 10:12:43 -03:00
oobabooga
c33715ad5b Move towards HF LLaMA implementation 2023-03-05 01:20:31 -03:00
oobabooga
bd8aac8fa4 Add LLaMA 8-bit support 2023-03-04 13:28:42 -03:00
oobabooga
c93f1fa99b Count the tokens more conservatively 2023-03-04 03:10:21 -03:00
oobabooga
ed8b35efd2 Add --pin-weight parameter for FlexGen 2023-03-04 01:04:02 -03:00
oobabooga
05e703b4a4 Print the performance information more reliably 2023-03-03 21:24:32 -03:00
oobabooga
5a79863df3 Increase the sequence length, decrease batch size
I have no idea what I am doing
2023-03-03 15:54:13 -03:00
oobabooga
a345a2acd2 Add a tokenizer placeholder 2023-03-03 15:16:55 -03:00
oobabooga
5b354817f6 Make chat minimally work with LLaMA 2023-03-03 15:04:41 -03:00
oobabooga
ea5c5eb3da Add LLaMA support 2023-03-03 14:39:14 -03:00
oobabooga
2bff646130 Stop chat from flashing dark when processing 2023-03-03 13:19:13 -03:00
oobabooga
169209805d Model-aware prompts and presets 2023-03-02 11:25:04 -03:00
oobabooga
7bbe32f618 Don't return a value in an iterator function 2023-03-02 00:48:46 -03:00
oobabooga
ff9f649c0c Remove some unused imports 2023-03-02 00:36:20 -03:00
oobabooga
1a05860ca3 Ensure proper no-streaming with generation_attempts > 1 2023-03-02 00:10:10 -03:00
oobabooga
a2a3e8f797 Add --rwkv-strategy parameter 2023-03-01 20:02:48 -03:00
oobabooga
449116a510 Fix RWKV paths on Windows (attempt) 2023-03-01 19:17:16 -03:00
oobabooga
955cf431e8 Minor consistency fix 2023-03-01 19:11:26 -03:00
oobabooga
f3da6dcc8f
Merge pull request #149 from oobabooga/RWKV
Add RWKV support
2023-03-01 16:57:45 -03:00
oobabooga
831ac7ed3f Add top_p 2023-03-01 16:45:48 -03:00
oobabooga
7c4d5ca8cc Improve the text generation call a bit 2023-03-01 16:40:25 -03:00
oobabooga
2f16ce309a Rename a variable 2023-03-01 12:33:09 -03:00
oobabooga
9e9cfc4b31 Parameters 2023-03-01 12:19:37 -03:00
oobabooga
0f6708c471 Sort the imports 2023-03-01 12:18:17 -03:00
oobabooga
e735806c51 Add a generate() function for RWKV 2023-03-01 12:16:11 -03:00
oobabooga
659bb76722 Add RWKVModel class 2023-03-01 12:08:55 -03:00
oobabooga
9c86a1cd4a Add RWKV pip package 2023-03-01 11:42:49 -03:00
oobabooga
6837d4d72a Load the model by name 2023-02-28 02:52:29 -03:00
oobabooga
a1429d1607 Add default extensions to the settings 2023-02-28 02:20:11 -03:00
oobabooga
19ccb2aaf5 Handle <USER> and <BOT> 2023-02-28 01:05:43 -03:00
oobabooga
626da6c731 Handle {{user}} and {{char}} in example dialogue 2023-02-28 00:59:05 -03:00
oobabooga
e861e68e38 Move the chat example dialogue to the prompt 2023-02-28 00:50:46 -03:00
oobabooga
f871971de1 Trying to get the chat to work 2023-02-28 00:25:30 -03:00
oobabooga
67ee7bead7 Add cpu, bf16 options 2023-02-28 00:09:11 -03:00
oobabooga
ebd698905c Add streaming to RWKV 2023-02-28 00:04:04 -03:00
oobabooga
70e522732c Move RWKV loader into a separate file 2023-02-27 23:50:16 -03:00
oobabooga
ebc64a408c RWKV support prototype 2023-02-27 23:03:35 -03:00
oobabooga
021bd55886 Better format the prompt when generation attempts > 1 2023-02-27 21:37:03 -03:00
oobabooga
43b6ab8673 Store thumbnails as files instead of base64 strings
This improves the UI responsiveness for large histories.
2023-02-27 13:41:00 -03:00
oobabooga
f24b6e78a3 Fix clear history 2023-02-26 23:58:04 -03:00
oobabooga
8e3e8a070f Make FlexGen work with the newest API 2023-02-26 16:53:41 -03:00
oobabooga
3333f94c30 Make the gallery extension work on colab 2023-02-26 12:37:26 -03:00
oobabooga
633a2b6be2 Don't regenerate/remove last message if the chat is empty 2023-02-26 00:43:12 -03:00
oobabooga
6e843a11d6 Fix FlexGen in chat mode 2023-02-26 00:36:04 -03:00
oobabooga
4548227fb5 Downgrade gradio version (file uploads are broken in 3.19.1) 2023-02-25 22:59:02 -03:00
oobabooga
9456c1d6ed Prevent streaming with no_stream + generation attempts > 1 2023-02-25 17:45:03 -03:00
oobabooga
32f40f3b42 Bump gradio version to 3.19.1 2023-02-25 17:20:03 -03:00
oobabooga
fa58fd5559 Proper way to free the cuda cache 2023-02-25 15:50:29 -03:00
oobabooga
b585e382c0 Rename the custom prompt generator function 2023-02-25 15:13:14 -03:00
oobabooga
700311ce40 Empty the cuda cache at model.generate() 2023-02-25 14:39:13 -03:00
oobabooga
1878acd9f3 Minor bug fix in chat 2023-02-25 09:30:59 -03:00
oobabooga
e71ff959f5 Clean up some unused code 2023-02-25 09:23:02 -03:00
oobabooga
91f5852245 Move bot_picture.py inside the extension 2023-02-25 03:00:19 -03:00
oobabooga
5ac24b019e Minor fix in the extensions implementation 2023-02-25 02:53:18 -03:00
oobabooga
85f914b9b9 Disable the hijack after using it 2023-02-25 02:36:01 -03:00
oobabooga
7e9f13e29f Rename a variable 2023-02-25 01:55:32 -03:00
oobabooga
1741c36092 Minor fix 2023-02-25 01:47:25 -03:00
oobabooga
7c2babfe39 Rename greed to "generation attempts" 2023-02-25 01:42:19 -03:00
oobabooga
2dfb999bf1 Add greed parameter 2023-02-25 01:31:01 -03:00
oobabooga
13f2688134 Better way to generate custom prompts 2023-02-25 01:08:17 -03:00
oobabooga
67623a52b7 Allow for permanent hijacking 2023-02-25 00:55:19 -03:00
oobabooga
111b5d42e7 Add prompt hijack option for extensions 2023-02-25 00:49:18 -03:00
oobabooga
7a527a5581 Move "send picture" into an extension
I am not proud of how I did it for now.
2023-02-25 00:23:51 -03:00
oobabooga
e51ece21c0 Add ui() function to extensions 2023-02-24 19:00:11 -03:00
oobabooga
78ad55641b Remove duplicate max_new_tokens parameter 2023-02-24 17:19:42 -03:00
oobabooga
65326b545a Move all gradio elements to shared (so that extensions can use them) 2023-02-24 16:46:50 -03:00
oobabooga
0817fe1beb Move code back into the chatbot wrapper 2023-02-24 14:10:32 -03:00
oobabooga
8a7563ae84 Reorder the imports 2023-02-24 12:42:43 -03:00
oobabooga
ace74a557a Add some comments 2023-02-24 12:41:27 -03:00
oobabooga
fe5057f932 Simplify the extensions implementation 2023-02-24 10:01:21 -03:00