Using a Regex to extract the domain from a URL

Discussion:

Adam Wenham

2014-06-05 15:03:24 UTC

Hi guys,

I'm having some problems on good old Codewars, writing a method that can
take a URL and return just the domain.

I've managed to create a Regex in Rubular (http://rubular.com/r/C7wAZRq8OA)
that passes my tests, but I'm having trouble implementing it properly.

Here are my tests:
Test.assert_equals(domain_name("http://github.com/carbonfive/raygun"),
"github")
Test.assert_equals(domain_name("http://www.zombie-bites.com"),
"zombie-bites")
Test.assert_equals(domain_name("https://www.cnet.com"), "cnet")

Here's my method:
def domain_name(url)
url.match(/https*:\/\/w*\.*(\w*\-*\w*)./)
end

As far as I can tell, this should work. Any ideas on what I'm doing wrong?
Thanks!

--
== People often come up to me and ask "What the heck are you doing in my
shed!?" ==

Panagiotis Atmatzidis

2014-06-05 15:44:27 UTC

Permalink

Hello,

Post by Adam Wenham
Hi guys,
I'm having some problems on good old Codewars, writing a method that can take a URL and return just the domain.
I've managed to create a Regex in Rubular (http://rubular.com/r/C7wAZRq8OA) that passes my tests, but I'm having trouble implementing it properly.
Test.assert_equals(domain_name("http://github.com/carbonfive/raygun"), "github")
Test.assert_equals(domain_name("http://www.zombie-bites.com"), "zombie-bites")
Test.assert_equals(domain_name("https://www.cnet.com"), "cnet")
def domain_name(url)
url.match(/https*:\/\/w*\.*(\w*\-*\w*)./)
end
As far as I can tell, this should work. Any ideas on what I'm doing wrong? Thanks!

I tested your code and returns 'https://' too, so your regexp is wrong:
--
$ cat test.rb&& ruby test.rb

def domain_name(url)
puts url.match(/https*:\/\/w*\.*(\w*\-*\w*)./).to_s
end

list = %w{https://github.com/carbonfive/raygun http://www.zombie-bites.com https://www.cnet.com}

list.each {|x| domain_name(x)}

=> https://github.
=> http://www.zombie-bites.
=> https://www.cnet.
--

You could adjust the regexp to match a given set of domains domain names[1]. But the problem is unsolvable using regular expressions[2]. In theory you should create a list with all the available domain names, (maybe a google search will even give you a TXT file) and then write a complicate set of instructions to match those. Only this way you might get a complete solution IMHO.

Best regards,

[1] http://stackoverflow.com/questions/12772423/regex-match-main-domain-name

[2] http://stackoverflow.com/a/12772473/577133

Panagiotis (atmosx) Atmatzidis

email: ***@convalesco.org
URL: http://www.convalesco.org
GnuPG ID: 0x1A7BFEC5
gpg --keyserver pgp.mit.edu --recv-keys 1A7BFEC5

"As you set out for Ithaca, hope the voyage is a long one, full of adventure, full of discovery [...]" - C. P. Cavafy

Stu

2014-06-05 16:46:11 UTC

Permalink

/^https?:\/\/(www.)?[a-zA-Z0-9_-]*\.(com|net|org)\/?((([a-zA-Z\/0-9_-])+)?)$/

Post by Adam Wenham
Hi guys,
I'm having some problems on good old Codewars, writing a method that can
take a URL and return just the domain.
I've managed to create a Regex in Rubular (http://rubular.com/r/C7wAZRq8OA)
that passes my tests, but I'm having trouble implementing it properly.
Test.assert_equals(domain_name("http://github.com/carbonfive/raygun"),
"github")
Test.assert_equals(domain_name("http://www.zombie-bites.com"),
"zombie-bites")
Test.assert_equals(domain_name("https://www.cnet.com"), "cnet")
def domain_name(url)
url.match(/https*:\/\/w*\.*(\w*\-*\w*)./)
end
As far as I can tell, this should work. Any ideas on what I'm doing wrong?
Thanks!
--
== People often come up to me and ask "What the heck are you doing in my
shed!?" ==