this post was submitted on 06 Sep 2023
5 points (77.8% liked)

Hacker News

3943 readers
3 users here now

This community serves to share top posts on Hacker News with the wider fediverse.

Rules0. Keep it legal

  1. Keep it civil and SFW
  2. Keep it safe for members of marginalised groups

founded 1 year ago
MODERATORS
 

There is a discussion on Hacker News, but feel free to comment here as well.

you are viewing a single comment's thread
view the rest of the comments
[–] lvxferre@lemmy.ml 2 points 1 year ago* (last edited 1 year ago) (1 children)

He tried it, in a rather dumb way, comparing whole strings; e.g. 123 Main St, Brooklyn, NY 11217 vs. 124 Main St, Brooklyn, NY 11217.

It's silly because his whole approach to the problem was assumptive. It's fine to say "I don't know", or to code a program that does it. And yet he's trying to dichotomise the program's output to "same" vs. "different".

[–] superfes@beehaw.org 2 points 1 year ago (1 children)

I've never done Levenshtein on numbers, it seems like a silly thing to do.

Somehow I had skipped over that part of the text, danke.

[–] lvxferre@lemmy.ml 1 points 1 year ago* (last edited 1 year ago)

Yup - it's stupid. The catch is that text is yet another example of people hyping generative bots and trying to "sell" the idea as the solution for everything and a bit more; and one of the ways to do that is to make the alternative look worse than it is, for example incorrectly using the other tools at your disposal.

Even then I wouldn't use fuzzy string matching here, it's bound to introduce more false positives than it's worth. Such as Ant Street and Aunt Street matching (Levenshtein distance = 1). In those cases it's simply better to say "dunno".