Mirrors/text-generation-webui

mirror of https://github.com/oobabooga/text-generation-webui.git synced 2024-09-20 18:45:09 +02:00

Author	SHA1	Message	Date
jllllll	bee73cedbd	Streamline GPTQ-for-LLaMa support	2023-08-09 23:42:34 -05:00
oobabooga	e116d31180	Prevent unwanted log messages from modules	2023-05-21 22:42:34 -03:00
oobabooga	9d5025f531	Improve error handling while loading GPTQ models	2023-05-19 11:20:08 -03:00
oobabooga	b667ffa51d	Simplify GPTQ_loader.py	2023-05-17 16:22:56 -03:00
oobabooga	fb91c07191	Minor bug fix	2023-05-17 11:16:37 -03:00
Alex "mcmonkey" Goodwin	1f50dbe352	Experimental jank multiGPU inference that's 2x faster than native somehow (#2100 )	2023-05-17 10:41:09 -03:00
oobabooga	2eeb27659d	Fix bug in --cpu-memory	2023-05-12 06:17:07 -03:00
oobabooga	3316e33d14	Remove unused code	2023-05-10 11:59:59 -03:00
oobabooga	dfd9ba3e90	Remove duplicate code	2023-05-10 02:07:22 -03:00
minipasila	334486f527	Added instruct-following template for Metharme (#1679 )	2023-05-09 22:29:22 -03:00
Carl Kenner	814f754451	Support for MPT, INCITE, WizardLM, StableLM, Galactica, Vicuna, Guanaco, and Baize instruction following (#1596 )	2023-05-09 20:37:31 -03:00
IJumpAround	020fe7b50b	Remove mutable defaults from function signature. (#1663 )	2023-05-08 22:55:41 -03:00
Matthew McAllister	d78b04f0b4	Add error message when GPTQ-for-LLaMa import fails (#1871 ) --------- Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>	2023-05-08 22:29:09 -03:00
camenduru	ba65a48ec8	trust_remote_code=shared.args.trust_remote_code (#1891 )	2023-05-07 17:42:44 -03:00
oobabooga	b6ff138084	Add --checkpoint argument for GPTQ	2023-05-04 15:17:20 -03:00
oobabooga	95d04d6a8d	Better warning messages	2023-05-03 21:43:17 -03:00
Wojtab	12212cf6be	LLaVA support (#1487 )	2023-04-23 20:32:22 -03:00
oobabooga	7438f4f6ba	Change GPTQ triton default settings	2023-04-22 12:27:30 -03:00
USBhost	e1aa9d5173	Support upstream GPTQ once again. (#1451 )	2023-04-21 12:43:56 -03:00
sgsdxzy	b57ffc2ec9	Update to support GPTQ triton commit c90adef (#1229 )	2023-04-17 01:11:18 -03:00
oobabooga	39099663a0	Add 4-bit LoRA support (#1200 )	2023-04-16 23:26:52 -03:00
oobabooga	a75e02de4d	Simplify GPTQ_loader.py	2023-04-13 12:13:07 -03:00
oobabooga	ca293bb713	Show a warning if two quantized models are found	2023-04-13 12:04:27 -03:00
oobabooga	fde6d06167	Prioritize names with the groupsize in them	2023-04-13 11:27:03 -03:00
oobabooga	f2bf1a2c9e	Add some comments, remove obsolete code	2023-04-13 11:17:32 -03:00
Light	da74cd7c44	Generalized weight search path.	2023-04-13 21:43:32 +08:00
Light	cf58058c33	Change warmup_autotune to a negative switch.	2023-04-13 20:59:49 +08:00
Light	a405064ceb	Better dispatch.	2023-04-13 01:48:17 +08:00
Light	f3591ccfa1	Keep minimal change.	2023-04-12 23:26:06 +08:00
oobabooga	8c6155251a	More robust 4-bit model loading	2023-04-09 23:19:28 -03:00
oobabooga	ea6e77df72	Make the code more like PEP8 for readability (#862 )	2023-04-07 00:15:45 -03:00
EyeDeck	39f3fec913	Broaden GPTQ-for-LLaMA branch support (#820 )	2023-04-06 12:16:48 -03:00
oobabooga	3d6cb5ed63	Minor rewrite	2023-04-05 01:21:40 -03:00
oobabooga	f3a2e0b8a9	Disable pre_layer when the model type is not llama	2023-04-05 01:19:26 -03:00
catalpaaa	4ab679480e	allow quantized model to be loaded from model dir (#760 )	2023-04-04 23:19:38 -03:00
OWKenobi	ee4547cd34	Detect "vicuna" as llama model type (#772 )	2023-04-04 13:23:27 -03:00
oobabooga	1cb9246160	Adapt to the new model names	2023-03-29 21:47:36 -03:00
oobabooga	010b259dde	Update documentation	2023-03-28 17:46:00 -03:00
oobabooga	0bec15ebcd	Reorder imports	2023-03-28 17:34:15 -03:00
Maya Eary	41ec682834	Disable kernel threshold for gpt-j	2023-03-28 22:45:38 +03:00
Maya Eary	1c075d8d21	Fix typo	2023-03-28 20:43:50 +03:00
Maya Eary	c8207d474f	Generalized load_quantized	2023-03-28 20:38:55 +03:00
oobabooga	49c10c5570	Add support for the latest GPTQ models with group-size (#530 ) Warning: old 4-bit weights will not work anymore! See here how to get up to date weights: https://github.com/oobabooga/text-generation-webui/wiki/LLaMA-model#step-2-get-the-pre-converted-weights	2023-03-26 00:11:33 -03:00
EyeDeck	dcfd866402	Allow loading of .safetensors through GPTQ-for-LLaMa	2023-03-23 21:31:34 -04:00
oobabooga	db4219a340	Update comments	2023-03-20 16:40:08 -03:00
oobabooga	7618f3fe8c	Add -gptq-preload for 4-bit offloading (#460 ) This works in a 4GB card now: ``` python server.py --model llama-7b-hf --gptq-bits 4 --gptq-pre-layer 20 ```	2023-03-20 16:30:56 -03:00
oobabooga	9a3bed50c3	Attempt at fixing 4-bit with CPU offload	2023-03-20 15:11:56 -03:00
askmyteapot	53b6a66beb	Update GPTQ_Loader.py Correcting decoder layer for renamed class.	2023-03-17 18:34:13 +10:00
oobabooga	265ba384b7	Rename a file, add deprecation warning for --load-in-4bit	2023-03-14 07:56:31 -03:00

49 commits