drholy/ezrknn-llm

mirror of https://github.com/DrHo1y/ezrknn-llm.git synced 2026-03-24 01:26:44 +07:00

Files

will.yang d59d017d4c release v1.0.1

2024-05-09 17:31:27 +08:00

640 B

Executable File

Raw Blame History

CHANGELOG

v1.0.1

Optimize model conversion memory occupation
Optimize inference memory occupation
Increase prefill speed
Reduce initialization time
Improve quantization accuracy
Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
Add Server invocation
Add inference interruption interface
Add logprob and token_id to the return value

v1.0.0

Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
Compatible with Hugging Face model architectures
Currently supports the models Llama, Qwen, Qwen2, and Phi-2
Supports quantization with w8a8 and w4a16 precision