this post was submitted on 24 May 2024

605 points (97.2% liked)

Technology

34894 readers

1080 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago

MODERATORS

MinutePhrase@lemmy.ml

605

Study Finds That 52 Percent of ChatGPT Answers to Programming Questions Are Wrong (futurism.com)

submitted 5 months ago by ekZepp@lemmy.world to c/technology@lemmy.ml

129 comments fedilink hide all child comments

top 50 comments

sorted by: hot top controversial new old

[–] vzq@lemmy.blahaj.zone 110 points 5 months ago (5 children)

You should see 52% of the first version of my code.

It doesn’t have to be right to be useful.

[–] restingboredface@sh.itjust.works 89 points 5 months ago (11 children)

Yeah, but the non-tech savvy business leaders see they can generate code with AI and think 'why do I need a developer if I have this AI?' and have no idea whether the code it produces is right or not. This stat should be shared broadly so leaders don't overestimate the capability and fire people they will desperately need.

[–] piecat@lemmy.world 36 points 5 months ago (2 children)

I say let it happen. If someone is dumb enough to fire all their workers... They deserve what will happen next

[–] homesweethomeMrL@lemmy.world 27 points 5 months ago

Well the firing’s happening so, i guess let's hope you’re right about the other part.

[–] TheRaven@lemmy.ca 12 points 5 months ago

It won’t happen like that. Leadership will just under-hire and expect all their developers to be way more efficient. Working will be really stressful with increased deadlines and people questioning why you couldn’t meet them.

[–] scrubbles@poptalk.scrubbles.tech 19 points 5 months ago

Yeah management are all for this, the first few years here are rough with them immediately hitting the "fire the engineers we have ai now". They won't realize their fuckup until they've been promoted away from it

load more comments (9 replies)

[–] dsemy@lemm.ee 34 points 5 months ago (4 children)

Yeah cause my favorite thing to do when programming is debugging someone else's broken code.

[–] Supervisor194@lemmy.world 10 points 5 months ago* (last edited 5 months ago) (1 children)

I think where it shines is in helping you write code you've never written before. I never touched Swift before and I made a fully functional iOS app in a week. Also, even with stuff I have done before, I can say "write me a function that does x" and it will and it usually works.

Like just yesterday I asked it to write me a function that would generate and serve up an .ics file based on a selected date and extrapolate the date of a recurring monthly meeting based on the day of the week picked and its position (1st week, 2nd week, etc) within the month and then make the .ics file reflect all that. I could have generated that code myself by hand but it would have probably taken me an hour or two. It did it in about five seconds and it worked perfectly.

Yeah, you have to know what you're doing in general and there's a lot of babysitting involved, but anyone who thinks it's just useless is plain wrong. It's fucking amazing.

Edit: lol the article is referring to a study that was using GPT 3.5, which is all but useless for coding. 4.0 has been out for a year blowing everybody's minds. Clickbait trash.

load more comments (1 replies)

load more comments (3 replies)

[–] Boozilla@lemmy.world 54 points 5 months ago* (last edited 5 months ago) (4 children)

It's been a tremendous help to me as I relearn how to code on some personal projects. I have written 5 little apps that are very useful to me for my hobbies.

It's also been helpful at work with some random database type stuff.

But it definitely gets stuff wrong. A lot of stuff.

The funny thing is, if you point out its mistakes, it often does better on subsequent attempts. It's more like an iterative process of refinement than one prompt gives you the final answer.

[–] Downcount@lemmy.world 31 points 5 months ago (5 children)

The funny thing is, if you point out its mistakes, it often does better on subsequent attempts.

Or it get stuck in an endless loop of two different but wrong solutions.

Me: This is my system, version x. I want to achieve this.

ChatGpt: Here's the solution.

Me: But this only works with Version y of given system, not x

ChatGpt: Try this.

Me: This is using a method that never existed in the framework.

ChatGpt:

[–] mozz@mbin.grits.dev 14 points 5 months ago

"Oh, I see the problem. In order to correct (what went wrong with the last implementation), we can (complete code re-implementation which also doesn't work)"
Goto 1

[–] UberMentch@lemmy.world 8 points 5 months ago (1 children)

I used to have this issue more often as well. I've had good results recently by **not ** pointing out mistakes in replies, but by going back to the message before GPT's response and saying "do not include y."

load more comments (1 replies)

load more comments (3 replies)

[–] mozz@mbin.grits.dev 19 points 5 months ago (2 children)

It’s incredibly useful for learning. ChatGPT was what taught me to unlearn, essentially, writing C in every language, and how to write idiomatic Python and JavaScript.

It is very good for boilerplate code or fleshing out a big module without you having to do the typing. My experience was just like yours; once you’re past a certain (not real high) level of complexity you’re looking at multiple rounds of improvement or else just doing it yourself.

[–] Boozilla@lemmy.world 6 points 5 months ago

Exactly. And for me, being in middle age, it's a big help with recalling syntax. I generally know how to do stuff, but need a little refresher on the spelling, parameters, etc.

load more comments (1 replies)

[–] tristan@aussie.zone 11 points 5 months ago* (last edited 5 months ago)

I was recently asked to make a small Android app using flutter, which I had never touched before

I used chatgpt at first and it was so painful to get correct answers, but then made an agent or whatever it's called where I gave it instructions saying it was a flutter Dev and gave it a bunch of specifics about what I was working on

Suddenly it became really useful..I could throw it chunks of code and it would just straight away tell me where the error was and what I needed to change

I could ask it to write me an example method for something that I could then easily adapt for my use

One thing I would do would be ask it to write a method to do X, while I was writing the part that would use that method.

This wasn't a big project and the whole thing took less than 40 hours, but for me to pick up a new language, setup the development environment, and make a working app for a specific task in 40 hours was a huge deal to me... I think without chatgpt, just learning all the basics and debugging would have taken more than 40 hours alone

[–] WalnutLum@lemmy.ml 7 points 5 months ago (1 children)

This is because all LLMs function primarily based on the token context you feed it.

The best way to use any LLM is to completely fill up it's history with relevant context, then ask your question.

load more comments (1 replies)

[–] dgmib@lemmy.world 51 points 5 months ago (4 children)

Sometimes ChatGPT/copilot’s code predictions are scary good. Sometimes they’re batshit crazy. If you have the experience to be able to tell the difference, it’s a great help.

[–] EatATaco@lemm.ee 7 points 5 months ago

Due to confusing business domain terms, we often name variables the form of XY and YX.

One time copilot autogenerated about two hundred lines of a class that was like. XY; YX; XXY; XYX; XYXY; ..... XXYYXYXYYYXYXYYXY;

It was pretty hilarious.

But that being said, it's a great tool that has definitely proven to worth the cost...but like with a co-op, you have to check it's work.

load more comments (3 replies)

[–] 0x01@lemmy.ml 48 points 5 months ago (13 children)

I'm a 10 year pro, and I've changed my workflows completely to include both chatgpt and copilot. I have found that for the mundane, simple, common patterns copilot's accuracy is close to 9/10 correct, especially in my well maintained repos.

It seems like the accuracy of simple answers is directly proportional to the precision of my function and variable names.

I haven't typed a full for loop in a year thanks to copilot, I treat it like an intent autocomplete.

Chatgpt on the other hand is remarkably useful for super well laid out questions, again with extreme precision in the terms you lay out. It has helped me in greenfield development with unique and insightful methodologies to accomplish tasks that would normally require extensive documentation searching.

Anyone who claims llms are a nothingburger is frankly wrong, with the right guidance my output has increased dramatically and my error rate has dropped slightly. I used to be able to put out about 1000 quality lines of change in a day (a poor metric, but a useful one) and my output has expanded to at least double that using the tools we have today.

Are LLMs miraculous? No, but they are incredibly powerful tools in the right hands.

Don't throw out the baby with the bathwater.

[–] LyD@lemmy.ca 14 points 5 months ago (1 children)

On the other hand, using ChatGPT for your Lemmy comments sticks out like a sore thumb

[–] FaceDeer@fedia.io 10 points 5 months ago

If you're careless with your prompting, sure. The "default style" of ChatGPT is widely known at this point. If you want it to sound different you'll need to provide some context to tell it what you want it to sound like.

Or just use one of the many other LLMs out there to mix things up a bit. When I'm brainstorming I usually use Chatbot Arena to bounce ideas around, it's a page where you can send a prompt to two randomly-selected LLMs and then by voting on which gave a better response you help rank them on a leaderboard. This way I get to run my prompts through a lot of variety.

[–] TrickDacy@lemmy.world 9 points 5 months ago

Refreshing to see a reasonable response to coding with AI. Never used chatgpt for it but my copilot experience mirrors yours.

I find it shocking how many developers seem to think so many negative thoughts about it programming with AI. Some guy recently said "everyone in my shop finds it useless". Hard for me to believe they actually tried copilot if they think that

load more comments (11 replies)

[–] sturlabragason@lemmy.world 26 points 5 months ago

For someone doing a study on LLM they don’t seem to know much about LLMs.

They don’t even mention which model was used…

Here’s the study used for this clickbait garbage :

https://dl.acm.org/doi/pdf/10.1145/3613904.3642596

[–] muhyb@programming.dev 26 points 5 months ago (1 children)

Ask "are you sure?" and it will apologize right away.

[–] Lemongrab@lemmy.one 13 points 5 months ago* (last edited 5 months ago)

And then agree with whatever you said, even if it was wrong.

[–] Crisps@lemmy.world 24 points 5 months ago (1 children)

In the short term it really helps productivity, but in the end the reward for working faster is more work. Just doing the hard parts all day is going to burn developers out.

[–] birbs@lemmy.world 6 points 5 months ago (1 children)

I program for a living and I think of it more as doing the interesting tasks all day, rather than the mundane and repetitive. Chat GPT and GitHub Copilot are great for getting something roughly right that you can tweak to work the way you want.

load more comments (1 replies)

[–] Epzillon@lemmy.ml 20 points 5 months ago

I worked for a year developing in Magento 2 (an open source e-commerce suite which was later bought up by Adobe, it is not well maintained and it just all around not nice to work with). I tried to ask some Magento 2 questions to ChatGPT to figure out some solutions to my problems but clearly the only data it was trained with was a lot of really bad solutions from forum posts.

The solutions did kinda work some of the times but the way it was suggesting it was absolutely horrifying. We're talking opening so many vulnerabilites, breaking many parts of the suite as a whole or just editing database tables. If you do not know enough about the tools you are working with implementing solutions from ChatGPT can be disasterous, even if they end up working.

[–] jsomae@lemmy.ml 17 points 5 months ago (3 children)

Sure, but by randomly guessing code you'd get 0%. Getting 48% right is actually very impressive for an LLM compared to just a few years ago.

[–] xthexder@l.sw0.com 23 points 5 months ago (2 children)

Just useful enough to become incredibly dangerous to anyone who doesn't know what they're doing. Isn't it great?

[–] jsomae@lemmy.ml 7 points 5 months ago (9 children)

Now non-coders can finally wield the foot-gun once reserved only for coders! /s

Truth be told, computer engineering should really be something that one needs a licence to do commercially, just like regular engineering. In this modern era where software can be ruinous to someone's life just like shoddy engineering, why is it not like this already.

load more comments (9 replies)

load more comments (1 replies)

load more comments (2 replies)

[–] ulterno@lemmy.kde.social 17 points 5 months ago

You forgot the "at least" before the 52%.

[–] floofloof@lemmy.ca 16 points 5 months ago* (last edited 5 months ago) (2 children)

What's especially troubling is that many human programmers seem to prefer the ChatGPT answers. The Purdue researchers polled 12 programmers — admittedly a small sample size — and found they preferred ChatGPT at a rate of 35 percent and didn't catch AI-generated mistakes at 39 percent.

Why is this happening? It might just be that ChatGPT is more polite than people online.

It's probably more because you can ask it your exact question (not just search for something more or less similar) and it will at least give you a lead that you can use to discover the answer, even if it doesn't give you a perfect answer.

Also, who does a survey of 12 people and publishes the results? Is that normal?

[–] B0rax@feddit.de 14 points 5 months ago

Even this Lemmy thread has more participants than the survey

[–] brbposting@sh.itjust.works 6 points 5 months ago

I have 13 friends who are researchers and they publish surveys like that all the time.

(You can trust this comment because I peer reviewed it.)

[–] ech@lemm.ee 12 points 5 months ago (4 children)

For the upteenth time - an llm just puts words together, it isn't a magic answer machine.

load more comments (4 replies)

[–] Evotech@lemmy.world 10 points 5 months ago (5 children)

Probably more than 52% of what programmers type is wrong too

load more comments (5 replies)

[–] paddirn@lemmy.world 10 points 5 months ago

I wonder if the AI is using bad code pulled from threads where people are asking questions about why their code isn’t working, but ChatGPT can’t tell the difference and just assumes all code is good code.

[–] eerongal@ttrpg.network 10 points 5 months ago (4 children)

Worth noting this study was done on gpt 3.5, 4 is leagues better than 3.5. I'd be interested to see how this number has changed

load more comments (4 replies)

[–] Samueru@lemmy.ml 8 points 5 months ago

I find it funny that thumbnail with a "fail" I'm actually surprised that it got 48% right.

[–] Thcdenton@lemmy.world 7 points 5 months ago (1 children)

It was pretty good for a while! They lowered the power of it like immortan joe. Do not be come addicted to AI

load more comments (1 replies)

[–] finestnothing@lemmy.world 6 points 5 months ago (1 children)

I use chatgpt semi-often... For generating stuff in a repeating pattern. Any time I have used it to make code, I don't save any time because I have to debug most of the generated code anyway. My main use case lately is making python dicts with empty keys (e.g. key1, key2... becomes "key1": "", "key2": "",...) or making a gold/prod level SQL view by passing in the backend names and frontend names (e.g. value_1, value_2... Value 1, Value 2,... Becomes value_1 as Value 1,...).

[–] ramirezmike@programming.dev 11 points 5 months ago (1 children)

I know this is gonna sound annoying but I just use vim for stuff like this. Even notepad++ has a macro thing too, right? My coworkers keep saying how much of a productivity boost it is but all I see it do is mess up stuff like this that only takes a few seconds in vim to setup and I know it'll be correct every time

load more comments (1 replies)

load more comments