Difference between << and += for Strings and Arrays. Bug?

Discussion:

Pieter Hugo

2009-11-24 14:46:28 UTC

Hi
I have been programming with Ruby for about 1 year and am loving it. I
cam across something that I can't figure out why it works the way it
does:

Eg:
a=[1,2]
b=a
b << 3

The value of b is now [1,2,3], but the value of a is also [1,2,3]????
And yet:

a=[1,2]
b=a
b += [3]

The value of b is still [1,2,3] but a remains [1,2]. This just doesnt
make sense to. The same happens with Strings. Why would the append
operator (<<) change the original variable, but other operators dont
seem to. PS What I really want to know is why the original variable is
changed at all?

--
Posted via http://www.ruby-forum.com/.

Rick DeNatale

2009-11-24 15:22:11 UTC

Permalink

Post by Pieter Hugo
Hi
I have been programming with Ruby for about 1 year and am loving it. I
cam across something that I can't figure out why it works the way it
a=[1,2]
b=a
b << 3
The value of b is now [1,2,3], but the value of a is also [1,2,3]????
a=[1,2]
b=a
b += [3]
The value of b is still [1,2,3] but a remains [1,2]. This just doesnt
make sense to. The same happens with Strings. Why would the append
operator (<<) change the original variable, but other operators dont
seem to. PS What I really want to know is why the original variable is
changed at all?

This is another example of the classic confusion between variables and objects.

variables only reference objects, the aren't objects themselves

a = [1, 2]
a.object_id # => 2148166040
b = a
b.object_id # => 2148166040
b << 3
b.object_id # => 2148166040
a # => [1, 2, 3]
b # => [1, 2, 3]

b += [4]
a # => [1, 2, 3]
b # => [1, 2, 3, 4]
a.object_id # => 2148166040
b.object_id # => 2148162660

a<< whatever

sends the << message to the object referenced by a, which changes the object.

This change will be visible no matter which variable referencing the object

The assignment operator changes the binding of the variable on the
left hand side so that it references the object which is the result of
evaluating the right hand side.

And x += y

is the same as

x = x + y

So in the

b += [3]
case

The variable b is changed to reference a new object which is the
result of b + [3]

HTH.

--
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

b***@gmail.com

2009-11-24 15:25:10 UTC

Permalink

http://ruby-doc.org/core/classes/Array.html#M002167
"Append—Pushes the given object on to the end of this array."

Post by Pieter Hugo
a=[1,2]
b=a
b += [3]
The value of b is still [1,2,3] but a remains [1,2]. This just doesnt
make sense to. The same happens with Strings. Why would the append
operator (<<) change the original variable, but other operators dont
seem to.

http://ruby-doc.org/core/classes/Array.html#M002209
"Concatenation—Returns a new array built by concatenating the two
arrays together to produce a third array."

Post by Pieter Hugo
PS What I really want to know is why the original variable is
changed at all?

After b=a, the variables 'a' and 'b' both reference the same object.
+= creates a new array (the + part) and changes b to reference that
new array (the = part). With << the object referenced by b is
directly modified, it appears that a is also changed because a
references the same object as b. It's not really that b was changed
by <<, the object referenced by b is what changed.

Tom Stuart

2009-11-24 15:26:02 UTC

Permalink

Post by Pieter Hugo
Hi

Hi,

Post by Pieter Hugo
I have been programming with Ruby for about 1 year and am loving it. I
cam across something that I can't figure out why it works the way it
a=[1,2]
b=a
b << 3
The value of b is now [1,2,3], but the value of a is also [1,2,3]????

b=a means 'the variable b points to the same object as the variable
a'. So calling a method on b which changes the underlying object means
that the underlying object is changed, regardless of whether you call
it a or b. The << method changes the array. As the docs describe it:
"Append—Pushes the given object on to the end of this array."
(http://ruby-doc.org/core/classes/Array.html#M002167).

Post by Pieter Hugo
a=[1,2]
b=a
b += [3]
The value of b is still [1,2,3] but a remains [1,2]. This just doesnt
make sense to.

'b += [3]' is syntactic sugar, or shorthand, for 'b = b + [3]'. i.e.
This calls the '+' method on b 'under the hood'. '+' does not change
the array, but returns a new array. From the docs again:
"Concatenation—Returns a new array built by concatenating the two
arrays together to produce a third array."
(http://ruby-doc.org/core/classes/Array.html#M002209). So here you're
getting a completely separate object and saying that the variable b
should point to this new object. The old object pointed to by b (and
also by a) is unaffected.

Post by Pieter Hugo
The same happens with Strings.

As per the docs for Strings on the equivalent methods.

Post by Pieter Hugo
Why would the append
operator (<<) change the original variable, but other operators dont
seem to. PS What I really want to know is why the original variable is
changed at all?

I hope my explanations make some kind of sense and help answer these questions?

Cheers,

Tom

Pieter Hugo

2009-11-24 16:11:44 UTC

Permalink

Hi Guys

Thanks for the quick responses and insights. While I understand the
issue now I must confess that it perturbs my 'least surprise'
expectation. Why was it decided that '<<' should reference the original
object but '+' should create a new object. When one works with a
variable one shouldn't need to worry about what other stuff is being
modified due to a unintentional reference? What other operators are
doing this? I noticed that the '.pop' array method was also modifying
the referenced array

a = [1,2]
b = a
b.pop

results in a and be being [1]

I worked around with

a = [1,2]
b = []
b += a
b.pop

Now b == [1] and a == [1,2], but its not as sleek. Is there maybe
another way?

--
Posted via http://www.ruby-forum.com/.

Tom Stuart

2009-11-24 16:16:12 UTC

Permalink

Post by Pieter Hugo
I worked around with
a = [1,2]
b = []
b += a
b.pop
Now b == [1] and a == [1,2], but its not as sleek. Is there maybe
another way?

Depending on your needs, try

a = [1, 2]
b = a.dup
b.pop

or

a = [1, 2]
a.last

Cheers,

Tom

b***@gmail.com

2009-11-24 16:36:22 UTC

Permalink

Post by Pieter Hugo
Thanks for the quick responses and insights. While I understand the
issue now I must confess that it perturbs my 'least surprise'
expectation. Why was it decided that '<<' should reference the original
object but '+' should create a new object.

Consider that b+=a is just a nice shorthand for b = b + a (or b = b.+(a)).

a = [1, 2]
b = [3, 4]
c = b + a

Would it be desirable for the :+ message to modify the object represented by b?

Marnen Laibow-Koser

2009-11-24 17:19:35 UTC

Permalink

Post by Pieter Hugo
Hi Guys
Thanks for the quick responses and insights. While I understand the
issue now I must confess that it perturbs my 'least surprise'
expectation.

According to Matz, POLS is only meant to apply within Ruby, not with
respect to other languages.

Post by Pieter Hugo
Why was it decided that '<<' should reference the original
object but '+' should create a new object.

That's the way those two operators work in most classes where they're
both defined. It's not that surprising.

Post by Pieter Hugo
When one works with a
variable one shouldn't need to worry about what other stuff is being
modified due to a unintentional reference?

Actually, keeping this straight is a fundamental part of programming in
Ruby. You can always use dup or clone to be safe.

Post by Pieter Hugo
What other operators are
doing this?

Ask the code, not us.

Best,
--
Marnen Laibow-Koser
http://www.marnen.org
***@marnen.org

--
Posted via http://www.ruby-forum.com/.

Brian Candler

2009-11-24 18:06:52 UTC

Permalink

Post by Pieter Hugo
Why was it decided that '<<' should reference the original
object but '+' should create a new object.

Perhaps because of the way C++ uses << with streams:

cout << "Hello world!";

Post by Pieter Hugo
When one works with a
variable one shouldn't need to worry about what other stuff is being
modified due to a unintentional reference?

That's what mutable objects are all about. Yes, it certainly can be a
source of bugs if you don't take care, but mutating state expected in
OOP.

If you don't like this, you could always try Erlang instead...

Post by Pieter Hugo
What other operators are
doing this? I noticed that the '.pop' array method was also modifying
the referenced array

Lots of operators modify the object. Look at 'ri String' and 'ri Array'
and then check out each of the methods. Examples: String#replace,
String#upcase!, String#chomp!

Sometimes they come in pairs which modify in-place or create a new
object for you. For example, str.upcase is really just a shortcut for
str.dup.upcase!

--
Posted via http://www.ruby-forum.com/.

Robert Klemme

2009-11-25 09:31:36 UTC

Permalink

<< does not always modify the receiver

irb(main):001:0> n = 1
=> 1
irb(main):002:0> m = n << 5
=> 32
irb(main):003:0> n.equal? m
=> false
irb(main):004:0>

The semantic of "appending" something to an object was indeed
inherited from C++ as Brian explained.

Post by Pieter Hugo
When one works with a
variable one shouldn't need to worry about what other stuff is being
modified due to a unintentional reference?

OO is all about modifying state of objects. The state change is a
side effect of a method call - and this is intentional. That way you
can model how real world things behave. A bank account does not
change as the effect of you depositing 10 bucks. If it would, you
would get a new bank account number after every deposit or withdrawal.
For obvious reasons this is a bad idea. Objects work the same way:
you change their state but you do not get a new one every time. There
are of course special cases (such as Fixnum and Bignum in Ruby) where
objects are immutable. This does make sense as well (just think about
keys in a Hash, you would have to rehash all the time if key state
would change).

This means in turn that it is your task as a programmer in an OO
language to think about what classes and objects you need, what state
transitions of an object you want to reasonably allow and which not.
You create your application by picking deliberate choices. For some
classes someone else did that for you: that's all the built in stuff
and the standard library code - actually, every library's code that
you are using. In those cases you need to adjust your usage of that
code to the semantics and conventions determined by that code.

Post by Pieter Hugo
What other operators are
doing this? I noticed that the '.pop' array method was also modifying
the referenced array
a = [1,2]
b = a
b.pop
results in a and be being [1]
I worked around with
a = [1,2]
b = []
b += a
b.pop
Now b == [1] and a == [1,2], but its not as sleek. Is there maybe
another way?

Why do you want to have two copies of your Array? #push and #pop are
intended to allow usage of an Array as stack (AKA LIFO). If you want
to use an array as LIFO there is no point in creating two copies
during every push and pop operation because you want only a single
stack and you want everybody who sees the stack have the same idea of
it. Otherwise you would be able to pop the same top element off the
stack multiple times. That does not make sense.

If you only need to know what the last element is, you can use #last or a[-1].

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Pieter Hugo

2009-11-26 07:39:44 UTC

Permalink

Hi Robert (and everyone else) - thanks for the well reasoned responses.
I'll get the hang of it I'm sure

Post by Robert Klemme
Why do you want to have two copies of your Array?

I have a setup where I have Folder objects. A folder can have many other
children (members) as sub folders, but it can also have many other
folders as parents (groups it belongs to). I get this done via a
crosslink table. When creating a new parent-child relationship I need to
make sure that the child is not somehow an ancestor or the parent I am
trying to subordinate it to (as I need to avoid circular reference)

So I wrote the folder function:

def ancestors
ancestors = self.groups #all the immediate parents are obviously
ancestors
scanfolders = [] #set up a stack to iterate through,
#looking for grandparents etc
scanfolders += ancestors #the stack starts with the current ancestors
if !scanfolders.nil? then
while scanfolders.length > 0 do # while there are items on the stack
scanitem = scanfolders.pop # get the last one and reduce the stack
if scanitem then
if !scanitem.groups.nil? then #if this item has parents
#add them to the stack
scanfolders += scanitem.groups
scanfolders.uniq!
ancestors += scanitem.groups #and record this item as an
#ancestor
ancestors.uniq!
end
end
end
end
return ancestors
end

So - to answer the question - I need to arrays that are initially the
same (direct parents), But the one will eventually contain all ancestors
and the other will be empty after iterating through all ancestors and
testing them for further ancestors.

I should just replace

scanfolders = []
scanfolders += ancestors

with

scanfolders = ancestors.dup

Regards

Pieter

--
Posted via http://www.ruby-forum.com/.

Brian Candler

2009-11-26 09:17:22 UTC

Permalink

Post by Pieter Hugo
def ancestors

For me the natural way to do this would be recursively, but an iterative
solution is of course fine, and probably more efficient anyway.

As for your problem of not modifying the 'groups' array you are looking
at, I'd suggest:

scanfolders = []
scanfolders.concat(ancestors)
...
scanfolders.concat(scanitem.groups)

This creates a new array (scanfolders), then mutates that array by
tacking on the contents of some other array, without modifying that
other array. The first two lines could also be scanfolders =
ancestors.dup of course.

Not that there's anything really wrong with scanfolders += foo. It's
just that if you're going to be mutating this array (with pop), then you
might as well go for mutation across the board. It's also more efficient
if this array becomes large, as there's less array copying and object
creation going on.

Post by Pieter Hugo
scanfolders = [] #set up a stack to iterate through,
#looking for grandparents etc
scanfolders += ancestors #the stack starts with the current ancestors
if !scanfolders.nil? then

I observe that this condition is *always* true, since scanfolders is
never nil, so can be removed.

Post by Pieter Hugo
while scanfolders.length > 0 do # while there are items on the stack

one option: until scanfolders.empty?

Post by Pieter Hugo
while scanfolders.length > 0 do # while there are items on the stack
scanitem = scanfolders.pop # get the last one and reduce the stack
if scanitem then

If your scanfolders never contains any nil items (and I don't see why it
should), then all three lines could shrink to

while scanitem = scanfolders.pop

Post by Pieter Hugo
if !scanitem.groups.nil? then #if this item has parents
#add them to the stack

if scanitem.groups

Post by Pieter Hugo
scanfolders += scanitem.groups
scanfolders.uniq!
ancestors += scanitem.groups #and record this item as an
#ancestor
ancestors.uniq!

The problem I see here is that you might already have scanned a folder
and popped it from scanfolders, then add it back again, therefore
scanning it extra times unnecessarily. I can't convince myself whether
there is a possibility of ending up in an infinite loop.

One solution is to keep an explicit mark of folders you have traversed
(e.g. a hash of folder => true)

But since everything you've added to scanfolders has been added to
ancestors already, it may be sufficient to do:

scanfolders += scanitem.groups
scanfolders -= ancestors
ancestors += scanitem.groups

Or something like that. Actually I'm not sure if there's a mutating
version of -=

Regards,

Brian.

--
Posted via http://www.ruby-forum.com/.

Robert Klemme

2009-11-26 09:28:24 UTC

Permalink

Post by Pieter Hugo
Hi Robert (and everyone else) - thanks for the well reasoned responses.
I'll get the hang of it I'm sure

Post by Robert Klemme
Why do you want to have two copies of your Array?

I have a setup where I have Folder objects. A folder can have many other
children (members) as sub folders, but it can also have many other
folders as parents (groups it belongs to). I get this done via a
crosslink table. When creating a new parent-child relationship I need to
make sure that the child is not somehow an ancestor or the parent I am
trying to subordinate it to (as I need to avoid circular reference)
def ancestors
ancestors = self.groups #all the immediate parents are obviously
ancestors
scanfolders = [] #set up a stack to iterate through,
#looking for grandparents etc
scanfolders += ancestors #the stack starts with the current ancestors
if !scanfolders.nil? then
while scanfolders.length > 0 do # while there are items on the stack
scanitem = scanfolders.pop # get the last one and reduce the stack
if scanitem then
if !scanitem.groups.nil? then #if this item has parents
#add them to the stack
scanfolders += scanitem.groups
scanfolders.uniq!
ancestors += scanitem.groups #and record this item as an
#ancestor
ancestors.uniq!
end
end
end
end
return ancestors
end
So - to answer the question - I need to arrays that are initially the
same (direct parents), But the one will eventually contain all ancestors
and the other will be empty after iterating through all ancestors and
testing them for further ancestors.

As far as I can see you only need an inclusion check not the complete
list of ancestors. A simple iterative solution with a BFS could do
the job for you

require 'set'

def ancestor?(candidate)
visited = Set.new
queue = [self]

until queue.empty?
n = queue.shift

if visited.add? n
return true if candidate == n
queue.concat(n.ancestors)
end
end

false
end

Note: I prefer a BFS over a DFS in these cases because the stack depth
is far more limited than the memory:

10:26:50 ~$ ruby19 -e 'def r(x) p x; r(x+1) end; r 0' | tail -3
-e:1:in `r': stack level too deep (SystemStackError)
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `r'
... 8175 levels...
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `r'
from -e:1:in `<main>'
8184
8185
8186
10:27:04 ~$ ruby -e 'def r(x) p x; r(x+1) end; r 0' | tail -3
-e:1:in `inspect': stack level too deep (SystemStackError)
from -e:1:in `p'
from -e:1:in `r'
from -e:1:in `r'
from -e:1
12623
12624
12625
10:27:08 ~$

Kind regards

robert

http://en.wikipedia.org/wiki/Breadth-first_search

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Robert Klemme

2009-11-26 10:16:40 UTC

Permalink

Post by Robert Klemme
As far as I can see you only need an inclusion check not the complete
list of ancestors. A simple iterative solution with a BFS could do
the job for you

Here's a more modularized version:

require 'set'

def ancestor_bfs
visited = Set.new
queue = [self]

until queue.empty?
n = queue.shift

if visited.add? n
yield n
queue.concat(n.ancestors)
end
end

false
end

def ancestor?(candidate)
ancestor_bfs {|n| return true if candidate == n}
false
end

# or even without a method but still fast exit when the node is found:
folder.to_enum(:ancestor_bfs).any? {|n| candidate == n}

# fully modularized

def bfs(next_meth, start = self)
visited = Set.new
queue = [start]

until queue.empty?
n = queue.shift

if visited.add? n
yield n
queue.concat(n.send(next_meth))
end
end

false
end

def bfs_ancestors(&b)
bfs(:ancestors, &b)
end

Now you can do

class Integer
def n; [self * 2, self * 2 + 1] end
end

irb(main):057:0> bfs(:n, 3) {|x| p x; break if x > 20}
3
6
7
12
13
14
15
24
=> nil
irb(main):058:0>

:-)

Cheers

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/