mirror of
https://github.com/DrHo1y/ezrknn-llm.git
synced 2026-03-24 01:26:44 +07:00
640 B
Executable File
640 B
Executable File
CHANGELOG
v1.0.1
- Optimize model conversion memory occupation
- Optimize inference memory occupation
- Increase prefill speed
- Reduce initialization time
- Improve quantization accuracy
- Add support for Gemma, ChatGLM3, MiniCPM, InternLM2, and Phi-3
- Add Server invocation
- Add inference interruption interface
- Add logprob and token_id to the return value
v1.0.0
- Supports the conversion and deployment of LLM models on RK3588/RK3576 platforms
- Compatible with Hugging Face model architectures
- Currently supports the models Llama, Qwen, Qwen2, and Phi-2
- Supports quantization with w8a8 and w4a16 precision