mirror of
https://github.com/DrHo1y/ezrknn-llm.git
synced 2026-03-23 17:16:44 +07:00
17 lines
640 B
Markdown
Executable File
17 lines
640 B
Markdown
Executable File
# CHANGELOG
|
|
## v1.0.1
|
|
- Optimize model conversion memory occupation
|
|
- Optimize inference memory occupation
|
|
- Increase prefill speed
|
|
- Reduce initialization time
|
|
- Improve quantization accuracy
|
|
- Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
|
|
- Add Server invocation
|
|
- Add inference interruption interface
|
|
- Add logprob and token_id to the return value
|
|
|
|
## v1.0.0
|
|
- Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
|
|
- Compatible with Hugging Face model architectures
|
|
- Currently supports the models Llama, Qwen, Qwen2, and Phi-2
|
|
- Supports quantization with w8a8 and w4a16 precision |