URL regex pattern in Rails

John Gruber writes about a liberal regex for matching URLs and Alan Storm posts a follow-up with explanation and improvement.

I’d like to share an URL auto-link regex I wrote and which now resides in Rails core:

%r{
  ( https?:// | www\. )  # URLs start with "http://" or "www."
  [^\s<]+                # allow all non-whitespace chars until an opening HTML tag
}x

Yes, the regex itself is simple. Too simple, because experienced programmers will notice that it doesn’t handle punctuation in the end, Wikipedia-style bracket pairs and so on.

The truth is, we handle these edge cases in Ruby code.

Yes, that’s a fair amount of Ruby code, but it handles all the cases we needed it to, including the one when the URL is already linked in the input text. It just shows how you don’t have to try handle everything with a regex; use your language too.

A contributor posted a patch for one last bit of functionality I’ve left out.

But I do understand that most regex patterns that people design for matching URLs are too large, often trying to whitelist characters. Gruber certainly proved a point when he showed that such patterns should be more liberal.

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.”
Now they have two problems.

Jamie Zawinski