RWKV (pronounced as RwaKuv) is an RNN with GPT-level LLM performance, which can also be directly trained like a GPT transformer (parallelizable).
So it's combining the best of RNN and transformer - great performance, fast inference, fast training, saves VRAM, "infinite" ctxlen, and free text embedding. Moreover it's 100% attention-free.
RWKV GUI with one-click install and API
Fast CPU/cuBLAS/CLBlast inference: int4/int8/fp16/fp32
Fastest GPU inference API with vulkan (good for nvidia/amd/intel)
Fast GPU inference with cuda/amd/vulkan
Training RWKV
LoRA finetuning
Chat with RWKV (in console)
Community wiki (with guide and FAQ)
Official RWKV pip package
Latest RWKV models on Hugging Face
中文介绍,对话和小说 Demo