Here is my code, a rake task
1. desc "Daily update"
2. task :daily_update do
3. crawler = Search.new
4. machine_count = 4
5. machine_id = ENV["MACHINE_ID"] || 1
6. loop do
7. items = Item.select(:id,:product_id,:sold_history).where("done=0
and id % #{machine_count}=#{machine_id}").take(100)
8. break if items empty?
9. items.each do |item|
10. begin
11. data = crawler.fetch_item_info(item.product_id)
12. sold_history = JSON.parse(item.sold_history).push(data[
:sold])
13. sold_history = JSON.generate(sold_history)
14. item.update(data.merge({sold_history:sold_history,:done=>
1}))
15. rescue=>e
16. warn "Error in daily update #{e.message}"
17. next
18. end
19. end
20. end
21. end
and the follow are customize class
1. class Search
2. def initialize
3. @user_agent = "Mozilla/5.0 (compatible; Googlebot/2.1; +
http://www.google.com/bot.html)"
4. end
5.
6. def fetch_item_info(product_id)
7. webpage = get("http://list.qoo10.sg/g/#{product_id}")
8. DetailParser.parse(webpage)
9. end
10.
11. private
12. def get(url)
13. HTTParty.get(url, headers: {"User-Agent"=>@user_agent}).body
14. end
15. end
16.
1. class DetailParser
2.
3. def self.parse(webpage)
4. webpage = Nokogiri::HTML(webpage)
5. price = webpage.xpath("//strong[@data-price!='']")[-1]
6. if price.nil?
7. price = webpage.at(
"//div[@id='ctl00_ctl00_MainContentHolder_MainContentHolderNoForm_retailPricePanel']/dl/dd"
)
8. end
9. price = price.nil? ? 0.00 : price.text[/(\d|\.){1,}/]
10. price ||= 0.00
11. sold = webpage.at("//span[@class='sold']/strong")
12. sold = sold.nil? ? 0 : sold.text.to_i
13. ship_from = webpage.at("//tr[@class='shpng']/td/text()") || ''
14. img_url = webpage.at("//input[@id='basic_image']")
15. img_url = img_url.nil? ? "" : URI.decode(img_url['value'])
16.
17. {sold: sold, pic: img_url, shipping_from: ship_from.to_s, price:
price}
18. end
19.
20. end
Post by Jérémy SEBANCan you show us your code (you can use pastebin http://pastebin.com/),
maybe there's a bottleneck in it we could identify ?
I think it shouldn't be, but it did.
In my program , I iterate through a table(named urls) using
ActiveRecordïŒget it's url(a field) and fetch webpage using HTTParty and
parse the webpage using Nokogiri then extract target tag content and store
into database.
Hi
I dont know these tools yet,
but have seen that some of the gems use enormous much memory (which seems
to be "normal"), maybe because the whole website is being parsed.
I have seen many -g flags when compiling the sources, so gdb could help
(but thats the hard way)
Another thing is ruby-debug, but I haven't used it yet.
what do you do in your program?
(fetching only shouldnt use much memory)
Bye
Berg
I had written a ruby program for fetching webpage from a batch of
urls. But as the program ran for a while, it took a lot of memory, which
lead to killed by the system unexpectedly. I want to debug this problem.
Are there some amazing tools for me to see the all the objects(including
the object's type) which was not released by GC when the program running.
Any suggestions are welcome.
Best Regard
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>