LocalLLaMA

2249 readers

1 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 1 year ago

MODERATORS

pax@sh.itjust.works

SkySyrup@sh.itjust.works

noneabove1182@sh.itjust.works

Beginner questions thread (sh.itjust.works)

submitted 1 year ago by noneabove1182@sh.itjust.works to c/localllama@sh.itjust.works

25 comments fedilink hide all child comments

Trying something new, going to pin this thread as a place for beginners to ask what may or may not be stupid questions, to encourage both the asking and answering.

Depending on activity level I'll either make a new one once in awhile or I'll just leave this one up forever to be a place to learn and ask.

When asking a question, try to make it clear what your current knowledge level is and where you may have gaps, should help people provide more useful concise answers!

you are viewing a single comment's thread
view the rest of the comments

[–] doodlebob@lemmy.world 2 points 9 months ago (1 children)

I have two 3090 Turbo GPUs and it seems like oobabooga doesn't split the load between the two cards when I try to run TheBloke/dolphin-2.7-mixtral-8x7b-AWQ.

Does anyone know how to make text generation webui use both cards? Do I need an nvlink between the two cards?

[–] noneabove1182@sh.itjust.works 4 points 9 months ago (1 children)

You shouldn't need nvlink, I'm wondering if it's something to do with AWQ since I know that exllamav2 and llama.cpp both support splitting in oobabooga

[–] doodlebob@lemmy.world 2 points 9 months ago

I think you're right. Saw a post on Reddit basically mentioning the same things I'm seeing.

It looks like autoawq supports it but it might be an issue with how oobabooga implements it or something...