πŸ“© Regex Day 2: Words, Boundaries and Emails


Welcome back to Pandas Daily! Your daily 5-minute boost to becoming confident in Python.

​

Yesterday we kicked off our journey on regex - a worldwide tool used to match patterns in text.

After nailing phone numbers, today we will push further to extract real text patterns - like emails or person's name. This is the same trick recruiters use to scan resumes and grab contacts in seconds.

​

πŸš€ w+ to find all words

w represents single character; + means one or more

In:
import re
text = "Regex makes text easy!"
print(re.findall(r"\w+", text))
Out: ['Regex', 'makes', 'text', 'easy']

​

🚧 \b for word boundary - pattern must be a separate word

πŸ₯Ά Brain freeze: pattern 'cat' when searched in text gives output from 'cat' and not 'concatenate'

In:
text = "I love my cat but not concatenate"
pattern = r"\bcat\b"
print(re.findall(pattern, text))
Out: ['cat']

​

🚫 \B does opposite of \b - picks 'cat' only if its not a complete word

In:
text = "I love my cat"
pattern = r"\Bcat\B"
print(re.findall(pattern, text))
Out: []

Lets double check

​

In:
text = "I love my concatenate" # What does this sentence mean, lol
pattern = r"\Bcat\B"
print(re.findall(pattern, text))
Out: ['cat']

​

πŸ’ͺ Words starting with capital letters

  • [A-Z] ➜ first letter must be any capital alphabet
  • [a-z]+ ➜ followed by one or more lowercase letters
In:
text = "Alice and Bob went to New York"
pattern = r"\b[A-Z][a-z]+\b"
print(re.findall(pattern, text))
Out: ['Alice', 'Bob', 'New', 'York']

​

⛏️ Extract Emails

  • [A-Za-z0-9._%+-]+ ➜ Before @ - letters, numbers, dots or underscores
  • @[A-Za-z0-9.-]+ ➜ After @ - domain name (letters, numbers, dots, hyphen)
  • .[A-Za-z]{2,} ➜ After dot(.) - minimum 2 letters (.com, .org etc)
In:
text = "Contact us: [support@mail.com](mailto:support@mail.com) or sales@company.org"
pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}"
print(re.findall(pattern, text))
Out: ['support@mail.com', 'sales@company.org']

​

Want to πŸ‘‰ Verify if its a valid email? or know if its gmail, outlook or yahoo! Then tune in tomorrow to learn more fun regex stuff..


β­πŸ“£ That's it for today! If you liked it, please share it with anyone who will find it useful and share your feedback below 🐼

​

Pandas Daily

Beginner to Expert in Python in just 5 minutes

Read more from Pandas Daily

Welcome back to Pandas Daily! Your daily 5-minute boost to becoming confident in Python. I am sure when you print C:\new_folder your code breaks due to \. Or you wonder how to print single (It's) or double ("Hello") quotes. Enter Escape Sequences, tiny backslash commands in Python that fix all of this. πŸ‘‰ Most common uses: Format using new lines Print tabs Single or double quotes in text Write file paths safely Print symbols like Ο€ or❀️ Let's Begin... πŸ‘‡ \n Print in next line - the one we all...

Welcome back to Pandas Daily! Your daily 5-minute boost to becoming confident in Python. Think about the time you waste right-clicking and saving course lectures on YouTube one-by-one. Inefficient process, can cause errors and you lose focus. pytubefix eliminates that bottleneck. Today is day 3 of mastering YouTube with Python. You already know how to get stats of a single video; or download them in various formats. Now time to level up with playlists. Pytubefix has a Playlist object that...

Welcome back to Pandas Daily! Your daily 5-minute boost to becoming confident in Python. Yesterday we introduced pytubefix - a powerful module to access YouTube. We downloaded a video as well. Cool, but basic. What if you want 1080p instead of 480p. Or prefer WebM over MP4. To answer all this - let's cover downloads in depth today. 🀫 I am 100% sure you can charm recruiters or your colleagues as they won't know this! πŸ“₯ Code Recap: How to download YouTube video In: # Import library from...