Thanks to AWQ, TinyChat can deliver more efficient responses with LLM/VLM chatbots through 4-bit inference. TinyChat on RTX 4090 (3.4x faster than FP16): TinyChat on Jetson Orin (3.2x faster than FP16 ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results