Discussion:
Tools for debugging memory leak in ruby process
timlen tse
2016-09-14 05:35:32 UTC
Permalink
Hi all:
I had written a ruby program for fetching webpage from a batch of urls.
But as the program ran for a while, it took a lot of memory, which lead to
killed by the system unexpectedly. I want to debug this problem. Are there
some amazing tools for me to see the all the objects(including the object's
type) which was not released by GC when the program running.
Any suggestions are welcome.

Best Regard
A Berger
2016-09-14 06:48:04 UTC
Permalink
Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which seems
to be "normal"), maybe because the whole website is being parsed.

I have seen many -g flags when compiling the sources, so gdb could help
(but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)

Bye
Berg
Post by timlen tse
I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.
Best Regard
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
timlen tse
2016-09-14 07:35:24 UTC
Permalink
I think it shouldn't be, but it did.
In my program , I iterate through a table(named urls) using
ActiveRecordget it's url(a field) and fetch webpage using HTTParty and
parse the webpage using Nokogiri then extract target tag content and store
into database.
Post by A Berger
Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which seems
to be "normal"), maybe because the whole website is being parsed.
I have seen many -g flags when compiling the sources, so gdb could help
(but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)
Bye
Berg
Post by timlen tse
I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.
Best Regard
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Jérémy SEBAN
2016-09-14 09:06:57 UTC
Permalink
Can you show us your code (you can use pastebin http://pastebin.com/),
maybe there's a bottleneck in it we could identify ?
Post by timlen tse
I think it shouldn't be, but it did.
In my program , I iterate through a table(named urls) using
ActiveRecordget it's url(a field) and fetch webpage using HTTParty
and parse the webpage using Nokogiri then extract target tag content
and store into database.
Post by A Berger
Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which
seems to be "normal"), maybe because the whole website is being
parsed.
I have seen many -g flags when compiling the sources, so gdb could
help (but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)
Bye
Berg
Post by timlen tse
I had written a ruby program for fetching webpage from a batch
of urls. But as the program ran for a while, it took a lot of
memory, which lead to killed by the system unexpectedly. I want
to debug this problem. Are there some amazing tools for me to
see the all the objects(including the object's type) which was
not released by GC when the program running.
Any suggestions are welcome.
Best Regard
lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
timlen tse
2016-09-14 09:47:50 UTC
Permalink
Here is my code, a rake task


1. desc "Daily update"
2. task :daily_update do
3. crawler = Search.new
4. machine_count = 4
5. machine_id = ENV["MACHINE_ID"] || 1
6. loop do
7. items = Item.select(:id,:product_id,:sold_history).where("done=0
and id % #{machine_count}=#{machine_id}").take(100)
8. break if items empty?
9. items.each do |item|
10. begin
11. data = crawler.fetch_item_info(item.product_id)
12. sold_history = JSON.parse(item.sold_history).push(data[
:sold])
13. sold_history = JSON.generate(sold_history)
14. item.update(data.merge({sold_history:sold_history,:done=>
1}))
15. rescue=>e
16. warn "Error in daily update #{e.message}"
17. next
18. end
19. end
20. end
21. end


and the follow are customize class



1. class Search
2. def initialize
3. @user_agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
4. end
5.
6. def fetch_item_info(product_id)
7. webpage = get("http://list.qoo10.sg/g/#{product_id}")
8. DetailParser.parse(webpage)
9. end
10.
11. private
12. def get(url)
13. HTTParty.get(url, headers: {"User-Agent"=>@user_agent}).body
14. end
15. end
16.


1. class DetailParser
2.
3. def self.parse(webpage)
4. webpage = Nokogiri::HTML(webpage)
5. price = webpage.xpath("//strong[@data-price!='']")[-1]
6. if price.nil?
7. price = webpage.at(
"//div[@id='ctl00_ctl00_MainContentHolder_MainContentHolderNoForm_retailPricePanel']/dl/dd"
)
8. end
9. price = price.nil? ? 0.00 : price.text[/(\d|\.){1,}/]
10. price ||= 0.00
11. sold = webpage.at("//span[@class='sold']/strong")
12. sold = sold.nil? ? 0 : sold.text.to_i
13. ship_from = webpage.at("//tr[@class='shpng']/td/text()") || ''
14. img_url = webpage.at("//input[@id='basic_image']")
15. img_url = img_url.nil? ? "" : URI.decode(img_url['value'])
16.
17. {sold: sold, pic: img_url, shipping_from: ship_from.to_s, price:
price}
18. end
19.
20. end
Post by Jérémy SEBAN
Can you show us your code (you can use pastebin http://pastebin.com/),
maybe there's a bottleneck in it we could identify ?
I think it shouldn't be, but it did.
In my program , I iterate through a table(named urls) using
ActiveRecordget it's url(a field) and fetch webpage using HTTParty and
parse the webpage using Nokogiri then extract target tag content and store
into database.
Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which seems
to be "normal"), maybe because the whole website is being parsed.
I have seen many -g flags when compiling the sources, so gdb could help
(but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)
Bye
Berg
I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.
Best Regard
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Robert Klemme
2016-09-14 14:09:43 UTC
Permalink
On Wed, Sep 14, 2016 at 9:35 AM, timlen tse <***@gmail.com> wrote:

Here's a very simplistic approach to counting object which might or
might not help:


class MemDiff
def dump
counts = ObjectSpace.each_object.each_with_object(Hash.new 0)
{|o,h| h[o.class] +=1 }

if @last
counts.keys.sort_by {|c| c.name || c.inspect}.each do |c|
diff = counts[c] - @last[c]
printf "%-30s %20d\n", c, diff if diff != 0
end
end

@last = counts
self
end
end


md = MemDiff.new

md.dump

10.times.map &:to_s

md.dump

This is of course not a real memory debugger as it won't give you
allocation site. But the type of object that is increasing might give
you an indication.

Kind regards

robert
--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can
- without end}
http://blog.rubybestpractices.com/

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Matthew Kerwin
2016-09-14 06:52:51 UTC
Permalink
Post by timlen tse
I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.
Best Regard
​There's these:
https://rubygems.org/search?utf8=%E2%9C%93&query=memory+profiler

I can't vouch for anything there, BTW; it was just a search off the top of
my head.

Cheers
--
Matthew Kerwin
http://matthew.kerwin.net.au/
Loading...