Discussion:
[PATCH] mwrap 2.0.0 mwrap - LD_PRELOAD malloc wrapper for Ruby
Eric Wong
2018-07-20 09:25:16 UTC
Permalink
mwrap is designed to answer the question:

Which lines of Ruby are hitting malloc the most?

mwrap wraps all malloc-family calls to trace the Ruby source
location of such calls and bytes allocated at each callsite.
As of mwrap 2.0.0, it can also function as a leak detector
and show live allocations at every call site. Depending on
your application and workload, the overhead is roughly a 50%
increase memory and runtime.

It works best for allocations under GVL, but tries to track
numeric caller addresses for allocations made without GVL so you
can get an idea of how much memory usage certain extensions and
native libraries use.

It requires the concurrent lock-free hash table from the
Userspace RCU project: https://liburcu.org/

It does not require recompiling or rebuilding Ruby, but only
supports Ruby trunk (2.6.0dev+) on a few platforms:

* GNU/Linux
* FreeBSD (tested 11.1)

It may work on NetBSD, OpenBSD and DragonFly BSD.


Changes in 2.0.0:

This release includes significant changes to track live
allocations and frees. It can find memory leaks from malloc
with less overhead than valgrind's leakchecker and there is a
new Rack endpoint (MwrapRack) which can display live allocation
stats.

API additions:

* Mwrap#[] - https://80x24.org/mwrap/Mwrap.html#method-c-5B-5D
* Mwrap::SourceLocation - https://80x24.org/mwrap/Mwrap/SourceLocation.html
* MwrapRack - https://80x24.org/mwrap/MwrapRack.html

Incompatible changes:

* Mwrap.clear now an alias to Mwrap.reset; as it's unsafe
to implement the new Mwrap#[] API otherwise:
https://80x24.org/mwrap-public/20180716211933.5835-12-***@80x24.org/

26 changes since v1.0.0:

README: improve usage example
MANIFEST: add .document
add benchmark
use __attribute__((weak)) instead of dlsym
Mwrap.dump: do not segfault on invalid IO arg
bin/mwrap: support LISTEN_FDS env from systemd
support per-allocation headers for per-alloc tracking
mwrap: use malloc to do our own memalign
hold RCU read lock to insert each allocation
realloc: do not copy if allocation failed
internal_memalign: do not assume real_malloc succeeds
ensure ENOMEM is preserved in errno when appropriate
memalign: check alignment on all public functions
reduce stack usage from file names
resolve real_malloc earlier for C++ programs
allow analyzing live allocations via Mwrap[location]
alias Mwrap.clear to Mwrap.reset
implement accessors for SourceLocation
mwrap_aref: quiet -Wshorten-64-to-32 warning
fixes for FreeBSD 11.1...
use memrchr to extract address under glibc
do not track allocations for constructor and Init_
disable memalign tracking by default
support Mwrap.quiet to temporarily disable allocation tracking
mwrap_rack: Rack app to track live allocations
documentation updates for 2.0.0 release

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Eric Wong
2018-07-20 09:34:16 UTC
Permalink
Oops, forgot to include links, and title is [ANN] :x

Mailing list:

https://80x24.org/mwrap-public/
nntp://80x24.org/inbox.comp.lang.ruby.mwrap
mailto:mwrap-***@80x24.org (no HTML mail, please)

git clone https://80x24.org/mwrap.git

homepage + rdoc: https://80x24.org/mwrap/

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Sam Saffron
2018-07-26 01:36:25 UTC
Permalink
I am using mwrap to debug a little leak at the moment, one feature
request I do have though is a tally of totals.

It would be nice if it could keep track of total allocated and total
released. That way if my RSS is bloating I can tell if it is due to
fragmentation or if it is due to a genuine leak really quick.

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Sam Saffron
2018-07-26 01:37:41 UTC
Permalink
Just to clarify here, I mean 2 single global totals, not a per row
kind of thing.
Post by Sam Saffron
I am using mwrap to debug a little leak at the moment, one feature
request I do have though is a tally of totals.
It would be nice if it could keep track of total allocated and total
released. That way if my RSS is bloating I can tell if it is due to
fragmentation or if it is due to a genuine leak really quick.
Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Eric Wong
2018-07-26 02:46:01 UTC
Permalink
Post by Sam Saffron
Just to clarify here, I mean 2 single global totals, not a per row
kind of thing.
Post by Sam Saffron
I am using mwrap to debug a little leak at the moment, one feature
request I do have though is a tally of totals.
It would be nice if it could keep track of total allocated and total
released. That way if my RSS is bloating I can tell if it is due to
fragmentation or if it is due to a genuine leak really quick.
Something like the patch below? (Barely tested)

Since mwrap doesn't track its own memory usage; this might be
useful if you have a lot of cold code paths doing allocations,
since RSS might not stabilize quickly in that case.

Also, if there's a leaker using a malloc wrapper like Ruby's
xmalloc (e.g. https://bugs.ruby-lang.org/issues/14929 ) ; mwrap
won't make it easy to track down since it can only safely see
the one level up the call stack (using GCC's __builtin_return_address
with a non-zero level isn't safe)


diff --git a/ext/mwrap/mwrap.c b/ext/mwrap/mwrap.c
index acc8960..9bb44d0 100644
--- a/ext/mwrap/mwrap.c
+++ b/ext/mwrap/mwrap.c
@@ -32,6 +32,8 @@ extern size_t __attribute__((weak)) rb_gc_count(void);
extern VALUE __attribute__((weak)) rb_cObject;
extern VALUE __attribute__((weak)) rb_yield(VALUE);

+static size_t total_bytes_inc, total_bytes_dec;
+
/* true for glibc/dlmalloc/ptmalloc, not sure about jemalloc */
#define ASSUMED_MALLOC_ALIGNMENT (sizeof(void *) * 2)

@@ -327,6 +329,8 @@ static struct src_loc *update_stats_rcu_lock(size_t size, uintptr_t caller)
if (caa_unlikely(!totals)) return 0;
if (locating++) goto out; /* do not recurse into another *alloc */

+ uatomic_add(&total_bytes_inc, size);
+
rcu_read_lock();
if (has_ec_p()) {
int line;
@@ -390,6 +394,7 @@ void free(void *p)
if (l) {
size_t age = generation - h->as.live.gen;

+ uatomic_add(&total_bytes_dec, h->size);
uatomic_set(&h->size, 0);
uatomic_add(&l->frees, 1);
uatomic_add(&l->age_total, age);
@@ -710,12 +715,16 @@ static VALUE mwrap_dump(int argc, VALUE * argv, VALUE mod)
return Qnil;
}

+/* The whole operation is not remotely atomic... */
static void *totals_reset(void *ign)
{
struct cds_lfht *t;
struct cds_lfht_iter iter;
struct src_loc *l;

+ uatomic_set(&total_bytes_inc, 0);
+ uatomic_set(&total_bytes_dec, 0);
+
rcu_read_lock();
t = rcu_dereference(totals);
cds_lfht_for_each_entry(t, &iter, l, hnode) {
@@ -1033,6 +1042,16 @@ static VALUE mwrap_quiet(VALUE mod)
return rb_ensure(rb_yield, SIZET2NUM(cur), reset_locating, 0);
}

+static VALUE total_inc(VALUE mod)
+{
+ return SIZET2NUM(total_bytes_inc);
+}
+
+static VALUE total_dec(VALUE mod)
+{
+ return SIZET2NUM(total_bytes_dec);
+}
+
/*
* Document-module: Mwrap
*
@@ -1084,6 +1103,8 @@ void Init_mwrap(void)
rb_define_singleton_method(mod, "each", mwrap_each, -1);
rb_define_singleton_method(mod, "[]", mwrap_aref, 1);
rb_define_singleton_method(mod, "quiet", mwrap_quiet, 0);
+ rb_define_singleton_method(mod, "total_bytes_allocated", total_inc, 0);
+ rb_define_singleton_method(mod, "total_bytes_freed", total_dec, 0);
rb_define_method(cSrcLoc, "each", src_loc_each, 0);
rb_define_method(cSrcLoc, "frees", src_loc_frees, 0);
rb_define_method(cSrcLoc, "allocations", src_loc_allocations, 0);
diff --git a/test/test_mwrap.rb b/test/test_mwrap.rb
index 8425c35..d112b4e 100644
--- a/test/test_mwrap.rb
+++ b/test/test_mwrap.rb
@@ -272,4 +272,15 @@ class TestMwrap < Test::Unit::TestCase
res == :foo or abort 'Mwrap.quiet did not return block result'
end;
end
+
+ def test_total_bytes
+ assert_separately(+"#{<<~"begin;"}\n#{<<~'end;'}")
+ begin;
+ require 'mwrap'
+ Mwrap.total_bytes_allocated > 0 or abort 'nothing allocated'
+ Mwrap.total_bytes_freed > 0 or abort 'nothing freed'
+ Mwrap.total_bytes_allocated > Mwrap.total_bytes_freed or
+ abort 'freed more than allocated'
+ end;
+ end
end

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Sam Saffron
2018-07-26 05:02:59 UTC
Permalink
Yes, this patch looks right to me.

Even if we don't have perfect fidelity here it will give absolute
clarity on "leak" vs "fragmentation related bloat". Even though
jemalloc tries hard to compensate for fragmentation bloat is still
possible.

For full context here is a dump when I started the process (it was 500meg rss)

https://transfer.sh/Q9zQS/start.txt

Here is how it looks now (1.2G rss):

https://transfer.sh/14fokY/now.txt


The script I use to generate this stuff is:

https://github.com/discourse/discourse/blob/master/script/mwrap_sidekiq

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>
Eric Wong
2018-07-26 06:21:50 UTC
Permalink
Post by Sam Saffron
Yes, this patch looks right to me.
OK, I've just pushed it out to RubyGems.org as a prerelease:

mwrap-2.0.0.4.gd1ea.gem
Post by Sam Saffron
Even if we don't have perfect fidelity here it will give absolute
clarity on "leak" vs "fragmentation related bloat". Even though
jemalloc tries hard to compensate for fragmentation bloat is still
possible.
Just wondering, are you still on jemalloc 3.6.0 or one of the
newer versions? I seem to remember 3.6.0 interacting badly with
cross-thread frees (from another project years ago); and mwrap
relies on call_rcu to free memory which is in another thread...

Maybe narenas:1 or even using MALLOC_ARENA_MAX=1 glibc malloc
might make it easier to discern a real leak from fragmentation.

In any case; one technique I've used in the past which never
required special debugging tools (aside from source access) was
to use a bisection search over the code path. I disabled/skipped
over half the remaining code until the leak could no longer be
reproduced to narrow down where it did happen.

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-talk>

Loading...