Stop generation immediately when using "Maximum tokens/second" (#3952)

---------

Co-authored-by: oobabooga <112222186+oobabooga@users.noreply.github.com>
This commit is contained in:
BadisG 2023-09-18 19:27:06 +02:00 committed by GitHub
parent b7c55665c1
commit 893a72a1c5
WARNING! Although there is a key with this ID in the database it does not verify this commit! This commit is SUSPICIOUS.
GPG key ID: 4AEE18F83AFDEB23

View file

@ -96,7 +96,7 @@ def _generate_reply(question, state, stopping_strings=None, is_chat=False, escap
last_update = cur_time
yield reply
if stop_found:
if stop_found or (state['max_tokens_second'] > 0 and shared.stop_everything):
break
if not is_chat: