this post was submitted on 28 Nov 2024
91 points (96.0% liked)

Linux

49008 readers
731 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago
MODERATORS
 

it's so confusing that the order changes when adding IDENTICAL strings to BOTH filenames. Is this really how it's supposed to be?

top 20 comments
sorted by: hot top controversial new old
[–] folekaule@lemmy.world 65 points 1 month ago

Yes. The periods are just part of the name like any other letter, so 5 is compared to m, and numbers sort before letters. You can add something like '.0' to make it sort more naturally. Look up an ASCII table to get a feeling for how strings are sorted.

[–] Lysergid@lemmy.ml 37 points 1 month ago* (last edited 1 month ago)

What you expecting called natural sorting. Mac employed natural sorting back in 90s. What you get is legitimate Alphabetical sorting which used by Linux and Windows. Natural sorting parses tokens in the string and compares them. Alphabetical sorting compares two strings by comparing individual characters at same index (position). Alphabetical sorting is quite common as it simpler to implement (or rather harder to screw up) and yields predictable results

One of many libraries for Python which implements natural sorting https://github.com/SethMMorton/natsort

[–] slazer2au@lemmy.world 30 points 1 month ago

Humans order by strings, computers order by characters.

[–] Dave@lemmy.nz 22 points 1 month ago (1 children)

I believe it's correct. If you sort say "A", "AA", "AAA" then you get

  1. A
  2. AA
  3. AAA

Because the first character is compared, which are all the same, then the second. The first one has no second character, so it comes first. The second has no third character, so it comes before the third item.

In your scenario, you have:

  1. 5
  2. 5.5

The first characters are the same, so it looks at the second character. Item 1 has no second character so it comes first.

Scenario 2:

  1. 5.5 A
  2. 5 A

The first character is the same, so it looks at the second character. The second characters are "." and " ". The "." comes first in the character ranking so is shown first.

[–] dysprosium@lemmy.dbzer0.com 4 points 1 month ago* (last edited 1 month ago) (2 children)

yes yes I get what you're saying but it's still odd. Didn't humans do this differently in the old analog days? I'm sure any human when working with a real paper archive in front of him, order 5 A before 5.5 A. Perhaps it has something to do with viewing 5 as 5.0 and 5.00, since they are mathematically equivalent, and come before 5.5. Although humans would also be inconsistent because they would order 5.9 before 5.11 if the context were to be chapters going from 5.9 -> 5.10 -> 5.11. But if these papers were to represent values, humans would order 5.9 AFTER 5.11. And computers obviously don't make exceptions based on context like humans do.

edit: if I understand correctly, I'd be cleaner if spaces come first in character ranking of ANY character. Perhaps that'd make it more human readable.

[–] atzanteol@sh.itjust.works 27 points 1 month ago

Humans aren't sorting this though. A computer is.

How should 5.2 and 5.12 be sorted? Numerically 5.12 is less than 5.2. But if it's a version then it's "five dot twelve" and thus 5.12 is greater.

These are contextual things that are very difficult for a computer to know. And trying to guess often just makes things weirder. So they often sort in a way that is at least consistent.

[–] Dave@lemmy.nz 12 points 1 month ago* (last edited 1 month ago) (1 children)

If a person was ordering them, they would do it in numerical order. Despite these being numbers, the computer is still ordering in alphabetical order.

Doing it the way a person would requires the file manager to understand context, which requires a lot more logic for arguably little benefit.

I note that your season and episode start with 0 as well (S01E05), in order to ensure the alphabetical ordering works. Perhaps you should use 5.0 to solve this in the same way.

[–] Deckweiss@lemmy.world 10 points 1 month ago (1 children)

Doing it the way a person would requires the file manager to understand context, which requires a lot more logic for arguably little benefit.

I'm so glad KDE Dolphin has a "natutal sorting" option. Not sure about this specific case, but I have never been surprised by the order with that setting.

Would be interesting to check the code behind it.

[–] BluesF@lemmy.world 13 points 1 month ago

It's an API call which emails a guy who just does it real fast by hand

[–] wobfan@lemmy.zip 10 points 1 month ago (1 children)

i'm not quite sure whether i understood your question but this seems to be right. the 5 from S01E05.*5*.mkv is higher in the alphabet then m from S01E05.*m*kv so it belongs above that entry.

[–] mathemachristian@hexbear.net 4 points 1 month ago* (last edited 1 month ago) (1 children)

Check the top: apparently "5" < "5.5" but "5 A" > "5.5 A". It's probably because a substring is lexicographically before the string containing it.

But when comparing 5 A with 5.5 A the second characters ' ' and '.' get compared and apparently '.' < ' '.

agree on your point though.

[–] dysprosium@lemmy.dbzer0.com 2 points 1 month ago (1 children)

I see, but wouldn't it make more "human readable sense" to order spaces before any other character? Any human working with analog archives would rank 5 A before 5.5 A, since they think 5 is 5.0 in their head

[–] mathemachristian@hexbear.net 2 points 1 month ago* (last edited 1 month ago)

I don't know the exact reasoning for it, I would guess it's because '.' is also used to delimit file types "A book on the.pdf" < "A book on the pdf.pdf" or "Book.pdf" < "Book sequel.pdf"

I guess your confusion has in part to do with reading digits as numbers, but within a string they are part of an alphabet, every character stands on its own and has no relation to the characters around it. There is no difference between "5 1", "5.1" and "511" you just pick an ordering of the alphabet and then sort each character accordingly.

[–] todd_bonzalez@lemm.ee 8 points 1 month ago

'5' does come before 'M'.

[–] juliebean@lemm.ee 8 points 1 month ago

that all makes sense to me. how would you want it to work?

[–] xeekei@lemm.ee 7 points 1 month ago

I've always felt like numbers should be ordered after letters. A-Z then 0-9.

[–] sovietknuckles@hexbear.net 4 points 1 month ago* (last edited 1 month ago)

What you're looking for is version sort. Here's how ls -1v sorts those files in the terminal, for example:

Link Click S01E04.mkv
Link Click S01E05.mkv
Link Click S01E05.5.mkv
Link Click S01E06.mkv

Nemo might be able to support version sort by way of a plugin, but I have not found one. The nnn CLI file manager supposedly supports version sort.

[–] Brewchin@lemmy.world 4 points 1 month ago

I encounter this mostly with manga. (I'll not rehash what others have said).

FWIW, and in that use case, I deal with it by renaming x5 to x5.0 so it will sort before x5.5. And then usually put both into an x5 directory and then zip that into a CBZ.

[–] Natanael@slrpnk.net 4 points 1 month ago* (last edited 1 month ago)

The correct solution is to make sure all files to be sorted have equivalent numerical structure, like 5.0 and 5.5

Same with eg. 05 and 10

[–] Frederic@beehaw.org 3 points 1 month ago

Always been weird, this is why for instance my ll alias is:

alias ll='LC_COLLATE=C ls -alFh'
alias ls='ls --color=auto'