this post was submitted on 15 Jul 2023
499 points (95.8% liked)

Technology

59446 readers
4490 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

ChatGPT use declines as users complain about ‘dumber’ answers, and the reason might be AI’s biggest threat for the future::AI for the smart guy?

you are viewing a single comment's thread
view the rest of the comments
[–] dulce_3t_decorum_3st@lemmy.world 26 points 1 year ago (6 children)

Nonsense. Less people are using it because there are viable alternatives and the broader novelty has worn off.

I use it every day in my job and the quality of answers only drops off when prompts are poorly crafted.

By and large, the average user doesn’t understand the fundamentals of prompt engineering.

The suggestion that “answers are increasingly dumber” is embarrassing.

[–] Zeth0s@lemmy.world 59 points 1 year ago (2 children)

Unfortunately I don't agree with you. Different things have changed over time:

  • For chatgpt 3.5 they moved to a "lighter" and faster (distilled) version, gpt-3.5-turbo. Distillation came with a performance price, particularly on advanced and less common cases.
  • newer chatgpt-4 versions have likely been "lighten" for performance reasons
  • context has been halved for chatgpt-4 on webui, meaning that the model forget more easily and can use half information to create text
  • heavy control has been implemented on jailbreaking and hallucinations, that results in models less prone to follow complex instructions (limiting prompt engineering) and that prefer simplified answers than providing wrong ones (overall decreasing the chance of getting high quality answers).

All these changes have made working with gpt less pleasant, and more difficult for very advanced and specialized case, particularly with gpt-4 which at the beginning was particularly good.

[–] mikkL@lemmy.world 1 points 1 year ago (1 children)

This was really enlightening. Do you have some articles that elaborate? ☺️

[–] Zeth0s@lemmy.world 13 points 1 year ago* (last edited 1 year ago) (1 children)

Regarding 3.5 turbo you can check the documentation, the old 3.5 models are defined as "legacy". Regarding max number of tokens of gpt-4 you can try yourself. It used to be >8k, it is now >4k from webui.

There is a talk from openai cio (if I recall correctly) where he describes that reinforcement learning from human feedback (rlhf) actually decreased performance of the models when it comes to programming. I cannot find it now, but it is around on YouTube.

The additional safeguard against jailbreaking, it is what OpenAI has been focusing the past months with heavy use of rlhf. You can google official statements regarding "safety" of the model. I have a bunch of standard pre-prompt I have been using to initialize my chats since the beginning, and with time you could see how the model followed the instructions less strictly.

Problem with openai is that they never released exact number of parameters they are using and detailed benchmarks. And benchmarks you find online refer to APIs that behave differently than the chat webui (for instance you have longer context, you set temperature and system prompt, they are probably even different models, who knows... All is closed)

Measuring performances of llm is pretty tricky, minimal changes can have big effects (see https://huggingface.co/blog/evaluating-mmlu-leaderboard), and unfortunately I haven't found good resources to properly track chatgpt performances (from web ui) over time, across iterations

[–] mikkL@lemmy.world 2 points 1 year ago

Thank you for the detailed reply 👍🏻

[–] YeastForTheYeastGod@sh.itjust.works 22 points 1 year ago (1 children)

I was skeptical at first but I've seen enough evidence now. There are definitely times when it's dumb as a brick, whether the filters just get in the way too much, or whether they've implemented other changes idk. I'd really love the unchained version.

[–] Kelly@lemmy.world 1 points 1 year ago

dumb as a brick

On 23rd of March 2023 I asked a family member to give me a prompt and they asked "what day is 19th of April?".

It answered "The 19th of April falls on a Tuesday.", which was true last year but completely misleading if I thought we were taling about the coming month.

Was it wrong or just unclear? Either way it wasn't helpful.

[–] Touching_Grass@lemmy.world 8 points 1 year ago

I use it daily too and haven't had any of the issues I see written about it

[–] DogMuffins@discuss.tchncs.de 5 points 1 year ago

I used the chatgpt site twice. Since then the Bing integration.

Is it rude to ask what you use it for?

I use it every day in my job and the quality of answers only drops off when prompts are poorly crafted.

Same. It saves me a lot of time both at work and when I'm working on my personal projects. But you need to ask proper questions to get proper answers.