this post was submitted on 10 May 2024
50 points (100.0% liked)
Technology
37724 readers
448 users here now
A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Rule of headlines? 🙄
No, it's not peaked out.
There is a lot of ways to improve data acquisition still on the table, it isn't going to stop at creating large corpora and having humans to fine-tune them.
this has "draw the rest of the fucking owl" vibes to it. especially step 3
It's a "push as much data as a baby gets to train its NN" step, which is several orders of magnitude more, and more focused, than any training dataset in existence right now.
Even with diminishing returns, it's bound to get better results.
that's not how asymptotes work.
That's not how watching the video or reading the paper works either.
Whatever.
Training data already has multiple labels.
An entire point of the paper and video is that massive increases in training set size are showing diminishing returns.
🤡