this post was submitted on 23 Jul 2024
31 points (100.0% liked)

Python

6366 readers
1 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

πŸ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

🐍 Python project:
πŸ’“ Python Community:
✨ Python Ecosystem:
🌌 Fediverse
Communities
Projects
Feeds

founded 1 year ago
MODERATORS
 

I wrote a TUI application to help you practice Python regular expressions. There are more than 100 exercises covering both the builtin re and third-party regex module.

If you have pipx, use pipx install regexexercises to install the app. See the repo for source code and other details.

top 3 comments
sorted by: hot top controversial new old
[–] alyth@lemmy.world 1 points 3 months ago (1 children)

Thanks for sharing this. I took the time to read through the documentation of the re module. Here's my review of the functions.

Useful:

  • re.finditer returns an iterator over all Match objects
  • re.search returns the first Match object or None if there are no matches.
  • r'' use raw strings for patters so you don't have to worry about backslashes
  • the optional flags argument modifies the behaviour (case insensitive, multiline)

Utility:

  • re.sub replace each match in the string
  • re.split split a string by a regular expression

The Match object:

  • match.groups(0) returns the portion of text matched by the pattern
  • match.groups(1) returns the first capturing group
  • match.groups(2) returns the second capturing group, and so on

I don't understand why these exist:

  • re.match like search, but only matches at the beginning of the string. why not just use '^' or '\A' in the pattern you pass to 'search'?
  • re.fullmatch like 'search', but only if the full string matches. Why not just use '\A' and '\Z' in the pattern you pass to 'search'?
  • re.findall Returns all matches. It seems like a shitty version of 'finditer'. The function has three different return types which depend on the pattern you pattern you pass to the function. Who wants to work with that?
[–] hades@lemm.ee 4 points 3 months ago (1 children)

I would argue that having distinct match and search helps readability. The difference between match('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s) and search('((([0-9]+-[0-9]+)|([0-9]+))[,]?)+[^,]', s) is clear without the need for me to parse the regular expression myself. It also helps code reuse. Consider that you have PHONE_NUMBER_REGEX defined somewhere. If you only had a method to "search" but not to "match", you would have to do something like search(f"\A{PHONE_NUMBER_REGEX}\Z", s), which is error-prone and less readable. Most likely you would end up having at least two sets of precompiled regex objects (i.e. PHONE_NUMBER_REGEX and PHONE_NUMBER_FULLMATCH_REGEX). It is also a fairly common practice in other languages' regex libraries (cf. [1,2]). Golang, which is usually very reserved in the number of ways to express the same thing, has 16 different matching methods[3].

Regarding re.findall, I see what you mean, however I don't agree with your conclusions. I think it is a useful convenience method that improves readability in many cases. I've found these usages from my code, and I'm quite happy that this method was available[4]:

digits = [digit_map[digit] for digit in re.findall("(?=(one|two|three|four|five|six|seven|eight|nine|[0-9]))", line)]
[(minutes, seconds)] = re.findall(r"You have (?:(\d+)m )?(\d+)s left to wait", text)

[1] https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html

[2] https://en.cppreference.com/w/cpp/regex

[3] https://pkg.go.dev/regexp

[4] https://github.com/search?q=repo%3Ahades%2Faoc23%20findall&type=code

[–] alyth@lemmy.world 3 points 3 months ago

Thank you for the very thorough reply! This is kind of high quality stuff you love to see on Lemmy. Your use cases seem very valid.