Do you do any kind of before/after testing of these to measure performance/accuracy changes? I've always wondered if there is some way to generalize the expected performance changes at different quantizations.
this post was submitted on 29 Feb 2024
19 points (100.0% liked)
LocalLLaMA
2249 readers
1 users here now
Community to discuss about LLaMA, the large language model created by Meta AI.
This is intended to be a replacement for r/LocalLLaMA on Reddit.
founded 1 year ago
MODERATORS
You can get the resulting PPL but that's only gonna get you a sanity check at best, an ideal world would have something like lmsys' chat arena and could compare unquantized vs quantized but that doesn't yet exist
upload them to ollama so we can also use them
Does ollama even support exl2?