Ruby forgets local variables during a while loop?

0 votes
asked Oct 31, 2009 by monojohnny

I'm processing a record-based text file: so I'm looking for a starting string which constitutes the start of a record: there is no end-of-record marker, so I use the start of the next record to delimit the last record.

So I have built a simple program to do this, but I see something which suprises me: it looks like Ruby is forgetting local variables exist - or have I found a programming error ? [although I don't think I have : if I define the variable 'message' before my loop I don't see the error].

Here's a simplified example with example input data and error message in comments:

flag=false
# message=nil # this is will prevent the issue.
while line=gets do
    if line =~/hello/ then
        if flag==true then
            puts "#{message}"
        end
        message=StringIO.new(line);
        puts message
        flag=true
    else
        message << line
    end
end

# Input File example:
# hello this is a record
# this is also part of the same record
# hello this is a new record
# this is still record 2
# hello this is record 3 etc etc
# 
# Error when running: [nb, first iteration is fine]
# <StringIO:0x2e845ac>
# hello
# test.rb:5: undefined local variable or method `message' for main:Object (NameError)
#

6 Answers

0 votes
answered Oct 31, 2009 by richj

I think this is because message is defined inside the loop. At the end of the loop iteration "message" goes out of scope. Defining "message" outside of the loop stops the variable from going out of scope at the end of each loop iteration. So I think you have the right answer.

You could output the value of message at the beginning of each loop iteration to test whether my suggestion is correct.

0 votes
answered Oct 31, 2009 by ennuikiller

Why do you think this is a bug? The interpreter is telling you that message may be undefined when that particular piece of code executes.

0 votes
answered Oct 31, 2009 by jrl

From the Ruby Programming Language:

alt text http://bks0.books.google.com/books?id=jcUbTcr5XWwC&printsec=frontcover&img=1&zoom=5&sig=ACfU3U1rnYKha_p7vEkpPm1Ow3o9RAM0nQ

Blocks and Variable Scope

Blocks define a new variable scope: variables created within a block exist only within that block and are undefined outside of the block. Be cautious, however; the local variables in a method are available to any blocks within that method. So if a block assigns a value to a variable that is already defined outside of the block, this does not create a new block-local variable but instead assigns a new value to the already-existing variable. Sometimes, this is exactly the behavior we want:

total = 0   
data.each {|x| total += x }  # Sum the elements of the data array
puts total                   # Print out that sum

Sometimes, however, we do not want to alter variables in the enclosing scope, but we do so inadvertently. This problem is a particular concern for block parameters in Ruby 1.8. In Ruby 1.8, if a block parameter shares the name of an existing variable, then invocations of the block simply assign a value to that existing variable rather than creating a new block-local variable. The following code, for example, is problematic because it uses the same identifier i as the block parameter for two nested blocks:

1.upto(10) do |i|         # 10 rows
  1.upto(10) do |i|       # Each has 10 columns
    print "#{i} "         # Print column number
  end
  print " ==> Row #{i}\n" # Try to print row number, but get column number
end

Ruby 1.9 is different: block parameters are always local to their block, and invocations of the block never assign values to existing variables. If Ruby 1.9 is invoked with the -w flag, it will warn you if a block parameter has the same name as an existing variable. This helps you avoid writing code that runs differently in 1.8 and 1.9.

Ruby 1.9 is different in another important way, too. Block syntax has been extended to allow you to declare block-local variables that are guaranteed to be local, even if a variable by the same name already exists in the enclosing scope. To do this, follow the list of block parameters with a semicolon and a comma-separated list of block local variables. Here is an example:

x = y = 0            # local variables
1.upto(4) do |x;y|   # x and y are local to block
                     # x and y "shadow" the outer variables
  y = x + 1          # Use y as a scratch variable
  puts y*y           # Prints 4, 9, 16, 25
end
[x,y]                # => [0,0]: block does not alter these

In this code, x is a block parameter: it gets a value when the block is invoked with yield. y is a block-local variable. It does not receive any value from a yield invocation, but it has the value nil until the block actually assigns some other value to it. The point of declaring these block-local variables is to guarantee that you will not inadvertently clobber the value of some existing variable. (This might happen if a block is cut-and-pasted from one method to another, for example.) If you invoke Ruby 1.9 with the -w option, it will warn you if a block-local variable shadows an existing variable.

Blocks can have more than one parameter and more than one local variable, of course. Here is a block with two parameters and three local variables:

hash.each {|key,value; i,j,k| ... }
0 votes
answered Oct 31, 2009 by telemachus

I'm not sure why you're surprised: on line 5 (assuming that the message = nil line isn't there), you potentially use a variable that the interpreter has never heard of before. The interpreter says "What's message? It's not a method I know, it's not a variable I know, it's not a keyword..." and then you get an error message.

Here's a simpler example to show you what I mean:

while line = gets do
  if line =~ /./ then
    puts message # How could this work?
    message = line
  end
end

Which gives:

telemachus ~ $ ruby test.rb < huh 
test.rb:3:in `<main>': undefined local variable or method `message' for main:Object (NameError)

Also, if you want to prepare the way for message, I would initialize it as message = '', so that it's a string (rather than nil). Otherwise, if your first line doesn't match hello, you end up tring to add line to nil - which will get you this error:

telemachus ~ $ ruby test.rb < huh 
test.rb:4:in `<main>': undefined method `<<' for nil:NilClass (NoMethodError)
0 votes
answered Oct 19, 2011 by kannan

you can simply do this :

message=''

while line=gets do
   if line =~/hello/ then
      # begin a new record 
      p message unless message == ''
      message = String.new(line)
   else
     message << line
  end
end

# hello this is a record
# this is also part of the same record
# hello this is a new record
# this is still record 2
# hello this is record 3 etc etc
0 votes
answered Oct 22, 2014 by kelvin

Contrary to some of the other answers, while loops don't actually create a new scope. The problem you're seeing is more subtle.

To help show the contrast, blocks passed to a method call DO create a new scope, such that a newly assigned local variable inside the block disappears after the block exits:

### block example - provided for contrast only ###
[0].each {|e| blockvar = e }
p blockvar  # NameError: undefined local variable or method

But while loops (like your case) are different:

arr = [0]
while arr.any?
  whilevar = arr.shift
end
p whilevar  # prints 0

The reason you get the error in your case is because the line that uses message:

puts "#{message}"

appears before any code that assigns message.

It's the same reason this code raises an error if a wasn't defined beforehand:

# Note the single (not double) equal sign.
# At first glance it looks like this should print '1',
#  because the 'a' is assigned before (time-wise) the puts.
puts a if a = 1

Not Scoping, but Parsing-visibility

The so-called "problem" - i.e. local-variable visibility within a single scope - is due to ruby's parser. Since we're only considering a single scope, scoping rules have nothing to do with the problem. At the parsing stage, the parser decides at which source locations a local variable is visible, and this visibility does not change during execution.

When determining if a local variable is defined (i.e. defined? returns true) at any point in the code, the parser checks the current scope to see if any code has assigned it before, even if that code has never run (the parser can't know anything about what has or hasn't run at the parsing stage). "Before" meaning: on a line above, or on the same line and to the left-hand side.

An exercise to determine if a local is defined (i.e. visible)

Note that the following only applies to local variables, not methods. (Determining whether a method is defined in a scope is more complex, because it involves searching included modules and ancestor classes.)

A concrete way to see the local variable behavior is to open your file in a text editor. Suppose also that by repeatedly pressing the left-arrow key, you can move your cursor backward through the entire file. Now suppose you're wondering whether a certain usage of message will raise the NameError. To do this, position your cursor at the place you're using message, then keep pressing left-arrow until you either:

  1. reach the beginning of the current scope (you must understand ruby's scoping rules in order to know when this happens)
  2. reach code that assigns message

If you've reached an assignment before reaching the scope boundary, that means your usage of message won't raise NameError. If you don't reach any assignment, the usage will raise NameError.

Other considerations

In the case a variable assignment appears in the code but isn't run, the variable is initialized to nil:

# a is not defined before this
if false
  # never executed, but makes the binding defined/visible to the else case
  a = 1
else
  p a  # prints nil
end 

While loop test case

Here's a small test case to demonstrate the oddness of the above behavior when it happens in a while loop. The affected variable here is dest_arr.

arr = [0,1]
while n = arr.shift
  p( n: n, dest_arr_defined: (defined? dest_arr) )

  if n == 0
    dest_arr = [n]
  else
    dest_arr << n
    p( dest_arr: dest_arr )
  end
end

which outputs:

{:n=>0, :dest_arr_defined=>nil}
{:n=>1, :dest_arr_defined=>nil}
{:dest_arr=>[0, 1]}

The salient points:

  • The first iteration is intuitive, dest_arr is initialized to [0].
  • But we need to pay close attention in the second iteration (when n is 1):
    • At the beginning, dest_arr is undefined!
    • But when the code reaches the else case, dest_arr is visible again, because the interpreter sees that it was defined beforehand (2 lines up).
    • Notice also, that dest_arr is only hidden at the start of the loop; its value is never lost.

This also explains why assigning your local before the while loop fixes the problem. The assignment doesn't need to be executed; it only needs to appear in the source code.

Lambda example

f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()
# Fails because the body of f1 tries to access f2 before an assignment for f2 was seen by the parser.
p f1.call()  # undefined local variable or method `f2'.

Fix this by putting an f2 assignment before f1's body. Remember, the assignment doesn't actually need to be executed!

f2 = nil  # Could be replaced by: if false; f2 = nil; end
f1 = ->{ f2 }
f2 = ->{ f1 }
p f2.call()
p f1.call()  # ok

Method-masking gotcha

Things get really hairy if you have a local variable with the same name as a method:

def dest_arr
  :whoops
end

arr = [0,1]
while n = arr.shift
  p( n: n, dest_arr: dest_arr )

  if n == 0
    dest_arr = [n]
  else
    dest_arr << n
    p( dest_arr: dest_arr )
  end
end

Outputs:

{:n=>0, :dest_arr=>:whoops}
{:n=>1, :dest_arr=>:whoops}
{:dest_arr=>[0, 1]}

A local variable assignment in a scope will "mask"/"shadow" a method call of the same name. (You can still call the method by using explicit parentheses or an explicit receiver.) So this is similar to the previous while loop test, except that instead of becoming undefined above the assignment code, the dest_arr method becomes "unmasked"/"unshadowed" so that the method is callable w/o parentheses. But any code after the assignment will see the local variable.

Some best-practices we can derive from all this

  • Don't name local variables the same as method names in the same scope
  • Don't put the initial assignment of a local variable in the body of a while or for loop, or anything that causes execution to jump around within a scope (calling lambdas or Continuation#call can do this too). Put the assignment before the loop.
Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...