this post was submitted on 19 Nov 2023
40 points (68.5% liked)

Technology

34889 readers
202 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
 

This is an idea I’ve been toying with for a bit. There is a ton of media that includes unimportant information that doesn’t need to be stored pixel perfect. Storing large portions of the image data as text will save substantial amounts of storage, and as the reality of on-device image generation becoming commonplace sets in digital memories will become the main way people capture the world around them. I think this will inevitably be the next form of media capture (photography and video), not replacing other methods/ formats, but I could see things like phone cameras having saving images as digital memories set to default to save on storage.

all 45 comments
sorted by: hot top controversial new old
[–] harry315@feddit.de 63 points 1 year ago (3 children)

currently, storage space is significantly cheaper than all the cpu power needed to generate the images from a text description. also, what if you actually wanted to view the backgroud of the object? and where's the advantage besides an at best 40 % increased storage space edficiency? after all, people are taking pictures to actually capture the moment. else they would do voice memos all the time.

[–] duncesplayed@lemmy.one 5 points 1 year ago (1 children)

after all, people are taking pictures to actually capture the moment

Depending on what you mean by "the moment", I don't think that's really true. Modern cell phone photography doesn't really give you what the sensors have picked up. You take a picture of your friend with his eyes closed and the phone will change the picture to have his eyes open. You take a blurry picture of the moon and your phone will enhance it to make a better picture of the moon. I mean some people hate it but a lot people do actually like it.

And they like it because they don't really take pictures for the purpose of posterity. They don't take a picture of their friend because they need to look back 20 years from now and remember exactly how that one plastic bag 30m in the distance was crumpled. They take the picture because they want to post to Instagram, get some likes from their friends, and maybe look back 20 years from now to remember the general vibe, and if their phone can "enhance" that for them.

If people could record a voice memo and have their phone actually make a really decent Instagram post out of it for them, I 1000% believe people would do it instead of taking an actual picture. Posting pictures is more about socializing than it is about posterity.

[–] omnissiah@iusearchlinux.fyi 3 points 1 year ago

People still photograph analog

[–] SkybreakerEngineer@lemmy.world 47 points 1 year ago

The mother of all lossy compression

[–] qantravon@lemmy.world 37 points 1 year ago (2 children)

I'm sorry, but no. Not only does that invoke a ton of extraneous processing on both ends (when saving and when recalling the image), but the rest of the image is still important, too! Can you imagine taking a photo at a family gathering, and then coming back later to see randomly generated people in the background? A photograph isn't just about the "subject", it's often about a moment in time.

[–] Crow@lemmy.world -3 points 1 year ago (2 children)

You wouldn’t use it if you didn’t want to, but I actually think family photos are great for this. The focus is family in the photo, not what the background details are. Not a lot of people care how many tree tops are in a graduation photo. Our memories work the same by storing the most important parts in the best detail, and redrawing other details when accessing the memory. And as handheld computation improves it will be a small task to render each image on view.

[–] qantravon@lemmy.world 13 points 1 year ago (2 children)

You misunderstand. You take a picture of, say your dad at a family reunion, and in the background the rest of your family is just milling around. That's not the subject, and so the AI model saves it as "people doing stuff" or whatever. When you load that photograph, the people in the background will be generated, and they won't be your family.

This is all beside the fact that the AI may decide your subject is different from what you think it is.

This is just an extremely unreliable form of data compression, and extremely unnecessary. Phones and cameras can currently save hundreds or thousands of photographs locally, and cloud storage can save millions for free, and even more for extremely cheap. You're solving a non-existent problem by shoehorning AI image generation in where it's not needed.

[–] jaykay@lemmy.zip 4 points 1 year ago

Imagine going through a photo album, and each time the image is different. Instead of enjoying the photos, you’re looking for what AI changed this time

[–] bpm@lemmy.ml 1 points 1 year ago

What you think of as important may change over time, as well - with the solution as written, you'd need to decide what the "subject" is at compress time, but what if you later realise that's the last ever photo of grandma, or the AI decides that you were wearing different shoes than you actually were. Worst case, you need to rely on some detail in a photo later, like to absolve you of a crime.

[–] stoy@lemmy.zip 8 points 1 year ago

You are missing the point of those photos though, they are memories, made to help you remember the exact moment you took a photo of.

You say it yourself, our memories ARE fuzzy, that is why this is dumb, it would mean getting gaslit by the very things we created to help us remember.

I am an IT guy, and I feel confident in saying that for the general consumer, storage is a solved problem, if you need more space, buy a new harddrive. Consumers in the vast majority of cases does not need instant access to all data, external drives are fine, if you want more protection, get a NAS with a decent raid setup, that is enough for most people, if you want even more protection, get two NAS, and set them to sync changes every night.

[–] hardware26@discuss.tchncs.de 25 points 1 year ago

I don't think this will work well and others already explained why, but thanks for using this community to pitch your idea. We should have more of these discussions here rather than CEO news and tech gossip.

[–] trash80@lemmy.dbzer0.com 21 points 1 year ago (1 children)

You should go pitch this to an audiophile community as a new lossless codec:

Freebird is only 2 minutes and 30 seconds long, because we replaced the guitar solo with text that says [10 minute guitar solo], so you're not losing anything.

[–] Crow@lemmy.world -1 points 1 year ago (1 children)

I think the music version would be MIDI.

[–] stoy@lemmy.zip 5 points 1 year ago (1 children)

No, just, no.

MIDI is awesome in some ways, but I would never replace an actual recording of an instrumental song with a midi file.

The midi is too denpentant on the decoder, and won't replicate the sound accurately.

[–] Crow@lemmy.world 0 points 1 year ago (2 children)
[–] steakmeout@lemmy.world 2 points 1 year ago

Have you? Do you understand what MIDI2 offers? What you’re describing is unrelated to controlling synths and samplers. What you’re describing seems more like a mod file system where the samples are generated on the fly and that has all same limits that and generative art does.

[–] stoy@lemmy.zip 1 points 1 year ago

Don't get me wrong, MIDI is awesome, as a music creation tool, but as a recording tool, it just won't work.

I love trackers, and have thousands of C64/Amiga remixes made on trackers (you can get those songs themselves at Remix64), but a midi or mod tracker is just fundamentally the wrong technology to use to record a copy of music, I am sure that there are programs to build midis from a recording, but that a new work based on an old, not a recordin

[–] Wes_Dev@lemmy.ml 11 points 1 year ago

It could be an interesting idea, but would be terrible to implement for anything where accuracy mattered.

Generally when you're doing video or image editing, you don't want the image to change after you're done saving it. That would be a loss of hundreds of hours of work in some cases. And if you're working on something where, small details matter, those might get lost in translation.

[–] tsonfeir@lemm.ee 11 points 1 year ago (1 children)

So I take a photo of a friend and then the ai changes them and it’s no longer them.

Actually sounds like a black mirror episode. So… congrats on that?

[–] Bitrot@lemmy.sdf.org 1 points 1 year ago (1 children)

As the “object” the friend would stay the same in this proposal, but everything behind them would vary.

[–] tsonfeir@lemm.ee 1 points 1 year ago

I mean… that’s gonna fuck with people’s memory. And what of the software that provides this service? What things will it decide aren’t important? A child playing in the background? A beautiful sunset that should be remembered as it was.

I think it’s an interesting idea, but the power it takes to make this happen every time you view the image and the wild inaccuracy that would inevitably happen would create significantly more problems than just improving image compression or investing time in increasing the capacity of existing storage devices.

Realistically, in the future we won’t have to worry at all about our disk space. It’s almost to that point already. My NAS is way more than I’ll ever use, and it was pretty cheap.

[–] stoy@lemmy.zip 10 points 1 year ago (1 children)

As someone how enjoys photography, this seems dumb.

This just adds another abstraction layer, on top of all the other abstraction layers, and for what?

Saving storage?

Storage is fairly cheap these days, processing power, less so.

We already have image compression, you don't need to save every raw file (though I save both every raw and jpg I get from my camera), if space is running out, get another harddrive.

I also don't believe that an AI would be able to recreate a picture exaclty the same way every time, even from the same prompt.

You would need to describe the image in exruciating detail to get the AI to draw the same picture every time, that would also take time to generate the image every time you want to see it, sure caches exists, but they take up storage and/or RAM.

[–] qantravon@lemmy.world 6 points 1 year ago (1 children)

The pitch is that everything surrounding the subject is extra, and so it doesn't matter if it's the same every time. It's literally throwing that information away in favor of a simplified description. It's extremely processor-intensive data compression.

[–] stoy@lemmy.zip 4 points 1 year ago* (last edited 1 year ago)

That is completely terrible, the background is often critical to the photo, there are only a tiny number of photos where the background might not matter.

The author claims to want to help preserve memories, but to me, it seems like this concept would change existing memories.

I use my photo collection as a way to remember events and places, making the memories clearer when I look at them, I can't imagine a time when I would ever want parts of the image to change on it's own from viewing time to viewing time.

The only kind of photos that this could work for would be stock photos, where the customer won't care if the photo convey a memory or not as long as it convey the message they want.

This is a dumb concept, less dumb in some specific areas, but still dumb, it feels kinda like that woman who tried to restore the painting of Jesus in Spain...

[–] paysrenttobirds@sh.itjust.works 6 points 1 year ago (1 children)

I love this and have had similar thoughts in relation to my non verbal kid wanting to keep memories in a way they can point out different parts and link together multiple things to make new stories or comments or hypotheticals. Important to have the context and the parts and named things all relating together. I don't know much about it, but there is a thing called "sidecar" file that can be associated with media. There are some moves to make EXIF data more standardized. So there's a chance this could be done in an open format.

[–] mkhoury@lemmy.ca 6 points 1 year ago

OP sounds like he's making a data compression pitch, but I think you have the better idea. I think surrounding the picture with a lot of contextual data about when/why/how this picture was taken will absolutely help recall and connecting to related concepts.

[–] Mandarbmax@lemmy.world 5 points 1 year ago (2 children)

This is a really cool idea. Other posters here have explained why it isn't a good idea, but I still think it is neat. Maybe there is a niche edge case use for such a thing? If nothing else it is very scifi

[–] sumofchemicals@lemmy.world 3 points 1 year ago

Agree it's fun to think about even if not practical. If anything reminds me of how my own memory works, where it's more like a description of what I saw than an image.

[–] Elgenzay@lemmy.ml 2 points 1 year ago

One use case might be for asset store pages, to show the object in various environments.

Maybe even have a field where you can input a description of the environment where you plan to use the asset

[–] TootSweet@lemmy.world 5 points 1 year ago (2 children)
[–] Rozz@lemmy.sdf.org 1 points 1 year ago

That's a need project. Mostly art but I can see it, without the required location data, as some sort of photo tour for housebound people. Maybe pair it with a maps street view tour.

[–] Crow@lemmy.world 0 points 1 year ago

That is cool.

[–] blazera@kbin.social 3 points 1 year ago

How is subject determined? Did you manually cut out that image?

[–] dullbananas@lemmy.ca 3 points 1 year ago (1 children)
[–] Mandarbmax@lemmy.world 14 points 1 year ago (1 children)

Op, please patent it so no other company can use it for the next 20 years

[–] andruid@lemmy.ml 2 points 1 year ago

I like the idea. Basically turning b Roll and background info into reproduceable info. So you could for example get a pixel perfect 8k view of say the main subject and edit around that instead of needing actual 8k of unimportant background scene.

I think an added one would trying to explore more with latent space to see how precise would might be able to get with the AI compressed details.