🏢 Must know building blocks: Regex Day 3

Welcome back to Pandas Daily! Your daily 5-minute boost to becoming confident in Python.

Yesterday we saw how you can extract emails using regex. Today let's expand a bit on emails and later go through the core characters (you know 80% regex if you know these) that power any regex.

I can't emphasize much on how powerful regex is, hence we are covering it in multiple days. Tomorrow might be the last.

🎯 match() to check validity of email

You can refer to yesterday's issue to understand below pattern ($ explanation coming below).

  In:

    email = "hello@site.com"

    pattern = r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$"

    print(bool(re.match(pattern, email)))

  Out:
  True

✨ Extract domains

  In:

    text = "Emails: user1@gmail.com, user2@yahoo.com"

    pattern = r"@([A-Za-z0-9.-]+)"

    print(re.findall(pattern, text))

  Out:
  ['gmail.com', 'yahoo.com']

Core Regex Characters

\d for any digit (0-9). \D for any non digit

  In:

    print(re.findall(r"\d+", "Pandas Daily is no. 1 newsletter of 2025"))

    print(re.findall(r"\D+", "Pandas Daily is no. 1 newsletter of 2025"))

  Out:

  ['1', '2025']

  ['Pandas Daily is no. ', ' newsletter of ']

\w matches letters, digits, underscores( _ ). \W matches to symbols and punctuations.

  In:

    print(re.findall(r"\w+", "Pandas Daily! - Provides value worth $$"))

    print(re.findall(r"\W+", "Pandas Daily! - Provides value worth $$"))

  Out:

  ['Pandas', 'Daily', 'Provides', 'value', 'worth']

  [' ', '! - ', ' ', ' ', ' $$']

^ checks at start

  In:

    print(re.findall(r"^Hello", "Hello World"))

    print(re.findall(r"^Hello", "World Hello"))

  Out:

  ['Hello']

  []

$ checks at end

  In:

    print(re.findall(r"World$", "Hello World"))

    print(re.findall(r"^World$", "World Hello"))

  Out:

  ['World']

  []

\s for whitespaces (space, tab, newline \n) . \S for non whitespace.

  In:

    text = "Too   many spaces\non two lines"

    print(re.findall(r"\s+", text))

    print(re.findall(r"\S+", text))

  Out:

  ['   ', ' ', '\n', ' ']

  ['Too', 'many', 'spaces', 'on', 'two', 'lines']

👆 Replace any extra whitespace

  In:
  
    print(re.sub(r"\s+", " ", text))

  Out:
  Too many spaces on two lines

⭐📣 That's it for today! If you liked it, please share it with anyone who will find it useful and share your feedback below 🐼

Your feedback helps us move in the right direction!

⭐⭐⭐ Great

⭐⭐ Decent

⭐ Needs improvement

Pandas Daily

🏢 Must know building blocks: Regex Day 3

Core Regex Characters

😰 We can't escape Python's escape sequences

📂 Stop downloading course lectures 1-by-1 — grab the whole playlist

🎯 Pick exact YouTube quality you want