this post was submitted on 27 Sep 2023
26 points (90.6% liked)
Trees
6778 readers
8 users here now
A community centered around cannabis.
In the spirit of making Trees a welcoming and uplifting place for everyone, please follow our Commandments.
- Be Cool.
- I'm not kidding. Be nice to each other.
- Avoid low-effort posts
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I've setup variations of stablediffusion alongside image training, but how is the text side of things? What is the knowledge bank compared to let's say ChatGPT primarily for language and vocabulary? I currently have a 12900k, going to be upgrading to a 14900k when those drop alongside my current RTX 4090, as well I'll be growing to 64GB of RAM from my current 32. How long do prompts take to generate and respond to on your system? Being able to host your own sounds splendid so I don't have to wait for there to be slots available in the peak user time on OpenAI.
Is the setup particularly any difficult? I don't mind building from source, but if there is light debugging needed to get it working on my system that's where my knowledge falls apart.
Apologies in advance bombarding you with questions
They are fairly similar. It is easier to train and play with modifiers like LoRAs in SD than it is for text, but you're likely going to be more interested in modifying the model loader code for text while your more likely to want to modify the model for SD
Not really the right question, but by inference, the model size and quantization settings determine how likely the model will be able to synthesize accurate responses. They will all boldly lie about anything because truth does not exist. The most likely token is always just the most likely next token. You can tune things somewhat, but in my experience size matters for accuracy. I can write python code using a Llama2 70B with 5 bit quantization at the same level as me searching stack overflow for code snippets, except I can do it offline and an order of magnitude faster. The snippets will work around 80% of the time and I can prompt the errors for 15% and it will generate the fix. This base model is not trained specifically on code. It can't do complex code generation like write me a function that opens a file, reads every line as UTF-8, removes all newline characters, and returns an array of sentences.
Another, test I like is ask a model to do prefix, infix, and postfix arithmetic (+ 3 3, 3 + 3, 3 3 +). The 70B is the only model that has done all three forms well.
I usually follow this with asking the model what the Forth programming language is and who invented it. Then ask it to generate a hello world message in ANS Forth. No model to date has been trained on Forth. Most don't have basic info on the language. I'm looking at how well it can say I don't know the answer or preface with how the output is unreliable. The 70B warns about not knowing but still gets most of the basic Forth syntax correct. One 7B I tested thinks Forth is an alias for Swift.
I have never used proprietary so I can't compare and contrast that. I am on a 12700 and 3080Ti laptop on Fedora. The 70B averages between 2-3 tokens a second. It is a slow reading pace and as slow as I care to go. I don't find it annoying or problematic, but it is not fast enough to combine with other tools for realtime interaction. Like it would likely get annoying as a complex bot/agent. If I had more memory I could likely run an even larger model. If I could buy something now, more memory is maybe mor betterer but I can't say how the sysmem controller bottleneck will impact the speed with bigger models. I would like to look into running a workstation server with a 2+ physical CPUs that have the AVX-512 instruction as this instruction and its subset are supported by much of the software. This is the actual specific use case this instruction was made for. A true server workstation is probably the most economical way to access even larger models. A serious enterprise GPU is many many thousands of dollars. If you don't know, GPU RAM does not have a memory address manager like system memory. The size of GPU RAM is tied directly to the size of computational hardware. System memory has a much smaller bus size and a separate controller that only shows the CPU part of the total memory at any given point in time. This overhead is why CPUs are slower than GPUs in AI, became tensor math is enormous parallel math operations. The key slow down is how fast the cache and get filled and flushed. AVX-512 is an assembly language x86 instruction that operates on a 512 bit wide word. It won't replace a GPU for parallelism, but from what I can tell, a few grand spent on a workstation with max cores, 256-512GB of sysmem, and something like a consumer 24GBV GPU, is likely the most bang for the buck at that kind of price.
Most stuff is pretty easy for me. The 70B makes it quite easy for me to do more than I ever have in the past when it comes to compiling and coding. I've written a dozen scripts in Python, done a good bit of bash functions and scripts. I will say even the 70B can't figure out the horrific linux find command with its man page exceeding the size of some entire coding languages.
I recommend trying the offline stuff and then start looking into ways to prompt it into giving you your "full user profile" it takes creativity, and you'll get some BS. A profile doesn't actually exist, that I'm aware of, but given enough context history, the data that can be synthesized is eye opening. This tech is more than capable of twiddling the twitches in a user's head if it is pushed to do so; like the cause and effect it could create would make no sense to anyone but the results would be undeniable. This data and generating process combined are not something I want to share with stalkerware companies looking to manipulate me for the highest bidder. I'm regularly impressed by what a 70B or smaller model can deduce about me without me telling it directly.