A very basic tail -f implementation

Discussion:

Daniel Berger

2002-07-29 20:15:25 UTC

Hi all,

I've been playing with a Ruby implementation of the *nix 'tail' command. Here's
what I've come up with so far. I thought I'd put it out there for comment and see
what people think.

It doesn't do anything fancy - it's not nearly as smart as it's Perl equivalent yet,
for example. But, it works.

Requires 1.7.2 (for the sysseek method).

# While this is running, try doing
# "echo 'sometext' >> test.txt" once in a while
require 'tail'
t = Tail.new("test.txt") # Or whatever

# Ctrl-C to exit
while true
puts t.read
end

# Da code
class Tail

attr_accessor :file, :buffer, :interval

def initialize(file, buffer=8192, interval=10)
@file = file
@buffer = buffer
@interval = interval

@fh = File.open(file,"r")
@fh.sysseek(-1,2)
end

def read
begin
return @fh.sysread(buffer).tr("\n","")
rescue EOFError
@fh.sysseek(-1,2)
sleep interval
retry
end
end
end

Paul Brannan

2002-07-30 13:35:03 UTC

Permalink

Post by Daniel Berger
# Ctrl-C to exit
while true
puts t.read
end

I have an implementation of tail -f that looks similar, except it
doesn't seek to the end of file at start, and it's called like this:

File.open(filename) do |input|
tail_f(input) do |line|
puts line
end
end

Two advantages here:
1) If an exception is raised from inside the block, the file will be
properly closed.
2) I get my input line-by-line, which imo is much more useful than
buffersize-by-buffersize.

Some advantages of your implementation:
1) It is an object, and acts somewhat like an IO object, thus
2) If RCR#65 (http://www.rubygarden.org/article.php?sid=179) were
accepted, it could easily be modified to inherit from IO and make
this class act just like a pipe.
3) The sleep interval and buffer size are adjustable.

My implementation:

def clearerr(io)
offset = io.tell
io.seek(offset)
end

def tail_f(input)
loop do
line = input.gets
yield line if not line.nil?
if input.eof? then
sleep 1
clearerr(input)
end
end
end

Paul

Berger, Daniel

2002-07-30 14:23:34 UTC

Permalink

-----Original Message-----
I have an implementation of tail -f that looks similar, except it
doesn't seek to the end of file at start

I guess to truly imitate the *nix tail -f command, it should read the 10
last lines and then start its wait cycle. Should it be that faithful?

File.open(filename) do |input|
tail_f(input) do |line|
puts line
end
end
1) If an exception is raised from inside the block, the file will be
properly closed.
2) I get my input line-by-line, which imo is much more useful than
buffersize-by-buffersize.

Regarding the file closure, I think I was relying on scope to kill it. I
wasn't really sure how egregious it was to not politely close a file. In
Perl, it's recommended but not required.

As for the input method, I think I like your line by line approach better.
I used the buffersize approach because, well, that's how the Perl module did
it. :)

1) It is an object, and acts somewhat like an IO object, thus
2) If RCR#65 (http://www.rubygarden.org/article.php?sid=179) were
accepted, it could easily be modified to inherit from IO and make
this class act just like a pipe.

I read the RCR. I think it's a good idea, though it also sounds like quite
a bit of work. Any idea on the status of this? Sounds like a good topic
for a Ruby gathering. :)

3) The sleep interval and buffer size are adjustable.
def clearerr(io)
offset = io.tell
io.seek(offset)
end
def tail_f(input)
loop do
line = input.gets
yield line if not line.nil?
if input.eof? then
sleep 1
clearerr(input)
end
end
end

Nice. I'm gonna work on this some today, so I'll see if perhaps I
can/should use this approach instead.

Regards,

Dan

Berger, Daniel

2002-07-30 21:15:08 UTC

Permalink

Sent: Tuesday, July 30, 2002 8:35 AM
Subject: Re: A very basic tail -f implementation

Post by Daniel Berger
# Ctrl-C to exit
while true
puts t.read
end

I have an implementation of tail -f that looks similar, except it

Well, it looks like Florian Frank went and released his own version of tail
today.

Coincidence? Ich denke nicht.

I'll try it out I guess. No documentation....sigh.

Regards,

Dan

Florian Frank

2002-07-31 12:02:04 UTC

Permalink

Post by Berger, Daniel
Well, it looks like Florian Frank went and released his own version of
tail today.
Coincidence? Ich denke nicht.

This is funny: I don't know if it is a coincidence. I've released it
because someone (rubyhacker was his nick IIRC) asked on the openproject
IRC channel for an implementation of a File::Tail module similiar to the
perl module. I was AFK, so I couldn't immediately answer him and he left
the channel before I was back. If you are rubyhacker, than it's no
coincidence at all. If you aren't rubyhacker, it's coincidental.

I've searched for a File::Tail module when I tried to upgrade my old
ipchains logfile prettifier (written in perl) to iptables logfiles a
while ago, because I wanted to port it to ruby at this opportunity. I
found nothing and so I implemented my own module. Since then it was
lying around on my hard disk. I obviously was to lazy to bring it into a
releasable form for quite a long time. :)

Post by Berger, Daniel
I'll try it out I guess. No documentation....sigh.

I could try to document a little more in the best english I'm capable
of. That is, it will probably be quite awful. ;)

--
It is of course always best to be led by god, and have him personally whisper
into your ear. Only, when it is the devil talking he will tell you he is god,
for the devil is a crafty liar. So you never know who is talking to you.
-- Franz Bibfeldt

James F.Hranicky

2002-07-31 20:50:17 UTC

Permalink

On Wed, 31 Jul 2002 21:02:04 +0900

Post by Florian Frank
I could try to document a little more in the best english I'm capable
of. That is, it will probably be quite awful. ;)

A couple of suggestions:

- Check for rotation by checking for changes in the inode number

- Allow tailing from arbitrary points in the file by lines, e.g

log.wind(10) # skip 10 lines from the beginning, print
# the rest of the file, then tail -f
log.rewind(-10) # print last 10 lines, then tail -f

Something like this:

def wind(lines)

seek(0, IO::SEEK_SET)

numlines = 0
0.upto(stat.size) { |filepos|
seek(filepos, IO::SEEK_SET)
return if (numlines == lines)
i = getc
c = sprintf("%c", i)
numlines += 1 if (c == "\n")
}

end

def rewind(lines)

seek(0, IO::SEEK_END)

lines = lines.abs
numlines = 0
size = stat.size - 1

size.downto(0) { |filepos|
next if (size == filepos)
seek(filepos, IO::SEEK_SET)
i = getc
c = sprintf("%c", i)
numlines += 1 if (c == "\n")
return if (numlines == lines)
}
end

----------------------------------------------------------------------
| Jim Hranicky, Senior SysAdmin UF/CISE Department |
| E314D CSE Building Phone (352) 392-1499 |
| ***@cise.ufl.edu http://www.cise.ufl.edu/~jfh |
----------------------------------------------------------------------

"Given a choice between a complex, difficult-to-understand, disconcerting
explanation and a simplistic, comforting one, many prefer simplistic
comfort if it's remotely plausible, especially if it involves blaming
someone else for their problems."
-- Bob Lewis, _Infoworld_

Paul Brannan

2002-08-02 13:55:02 UTC

Permalink

Post by James F.Hranicky
def rewind(lines)
seek(0, IO::SEEK_END)
lines = lines.abs
numlines = 0
size = stat.size - 1
size.downto(0) { |filepos|
next if (size == filepos)
seek(filepos, IO::SEEK_SET)
i = getc
c = sprintf("%c", i)
numlines += 1 if (c == "\n")

Could this be replaced with:
numlines += 1 if getc() == ?\n

Post by James F.Hranicky
return if (numlines == lines)
}
end

Paul

Josh Huber

2002-08-02 14:56:44 UTC

Permalink

Hmmm...I haven't run across "?\n" before -- what does that do?

It returns the character code for the following character:

irb(main):005:0> ?A
65
irb(main):006:0> ?B
66
irb(main):007:0> ?C
67

--
Josh Huber

James F.Hranicky

2002-08-02 15:05:21 UTC

Permalink

On Fri, 2 Aug 2002 23:56:44 +0900

Hmmm...I haven't run across "?\n" before -- what does that do?

Then it's much better :->

Jim

Lyle Johnson

2002-08-02 15:14:23 UTC

Permalink

Hmmm...I haven't run across "?\n" before -- what does that do?

The "?" operator returns the ASCII code of the following character:

?\n --> 10
?a --> 97
?A --> 65

HTH,

Lyle

James F.Hranicky

2002-08-02 14:48:24 UTC

Permalink

On Fri, 2 Aug 2002 22:55:02 +0900

Post by Paul Brannan

Post by James F.Hranicky
i = getc
c = sprintf("%c", i)
numlines += 1 if (c == "\n")

numlines += 1 if getc() == ?\n

Hmmm...I haven't run across "?\n" before -- what does that do?

Jim

James F.Hranicky

2002-08-02 14:58:24 UTC

Permalink

On Thu, 1 Aug 2002 05:50:17 +0900

Post by James F.Hranicky
On Wed, 31 Jul 2002 21:02:04 +0900

Post by Florian Frank
I could try to document a little more in the best english I'm capable
of. That is, it will probably be quite awful. ;)

One more suggestion: checkpoint:

log.next { |line|
process(line)
log.checkpoint # write out inode and filepos to a file
# or something
}

Something like

def checkpoint
open_checkpoint_file
write_inode_and_pos
sync_checkpoint_file
close_checkpoint_file
end

So, after a crash or reboot, you could start back up where you
left off, unless the inode has changed, and then you'd start at
the beginning of the new file.

Jim

Berger, Daniel

2002-08-01 12:23:09 UTC

Permalink

-----Original Message-----
Sent: Wednesday, July 31, 2002 3:50 PM
Subject: Re: A very basic tail -f implementation
On Wed, 31 Jul 2002 21:02:04 +0900

Post by Florian Frank
I could try to document a little more in the best english

I'm capable

Post by Florian Frank
of. That is, it will probably be quite awful. ;)

- Check for rotation by checking for changes in the inode number

This is proving difficult in practice because, as the Pickaxe says, the
File::Stat info is recorded at the moment the File::Stat object is created;
changes made to the file after that point will not be reflected. This
applies to filehandles as well, as experimentation has shown. Simply
calling the class method won't work either because that requires the
filename which, by that point, has already changed.

If I'm doing something wrong, I'd love to know how to make this work!

- Allow tailing from arbitrary points in the file by lines, e.g

Heh - funny you should mention it. I already submitted a very similar
approach to Florian. :)

Regards,

Dan

James F.Hranicky

2002-08-01 12:50:03 UTC

Permalink

On Thu, 1 Aug 2002 21:23:09 +0900

Post by Berger, Daniel

Post by James F.Hranicky
- Check for rotation by checking for changes in the inode number

This is proving difficult in practice because, as the Pickaxe says, the
File::Stat info is recorded at the moment the File::Stat object is created;
changes made to the file after that point will not be reflected. This
applies to filehandles as well, as experimentation has shown. Simply
calling the class method won't work either because that requires the
filename which, by that point, has already changed.
If I'm doing something wrong, I'd love to know how to make this work!

That's true, you just have to stat the actual filename everytime and
check against the current filehandle.

I wrote my own very simple tail -f thingy in Ruby a few months back,
and was recently working on improving it when I noticed that Florian
released his version. Here's what I have so far. Anyone can use any
of it however they want.

----------------------------------------------------------------------
| Jim Hranicky, Senior SysAdmin UF/CISE Department |
| E314D CSE Building Phone (352) 392-1499 |
| ***@cise.ufl.edu http://www.cise.ufl.edu/~jfh |
----------------------------------------------------------------------

"Given a choice between a complex, difficult-to-understand, disconcerting
explanation and a simplistic, comforting one, many prefer simplistic
comfort if it's remotely plausible, especially if it involves blaming
someone else for their problems."
-- Bob Lewis, _Infoworld_

Curt Sampson

2002-08-01 13:01:09 UTC

Permalink

Post by James F.Hranicky
On Thu, 1 Aug 2002 21:23:09 +0900

Post by James F.Hranicky
- Check for rotation by checking for changes in the inode number

...
That's true, you just have to stat the actual filename everytime and
check against the current filehandle.

Right. You also ought to check the device number, in case the file
is somewhow moved to another device and ends up with the same inode
number. At least, that's what I did when I added the '-F' option
to tail(1) in NetBSD. Maybe I'm just anal-retentive. :-)

Also, you might want to check to see if the file has been shortened,
in case the writer truncates it and starts writing again at the
beginning.

cjs

--
Curt Sampson <***@cynic.net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC

Berger, Daniel

2002-08-01 13:11:29 UTC

Permalink

-----Original Message-----

Post by James F.Hranicky
On Thu, 1 Aug 2002 21:23:09 +0900

Post by James F.Hranicky
- Check for rotation by checking for changes in

the inode number

Post by James F.Hranicky
...
That's true, you just have to stat the actual filename everytime and
check against the current filehandle.

Heh. I came across that while reading some usenet posts on a similar
subject.

Also, you might want to check to see if the file has been shortened,
in case the writer truncates it and starts writing again at the
beginning.

Wow. Thorough.

Well, here's what I had come up with so far, using one of Paul Brannan's
posts as a starting point. Between all of us, we should have something
pretty good and thorough soon. :)

class Tail < File

def initialize(file,sleeptime=10)
@file = File.open(file,"r")
@sleeptime = sleeptime
end

##############################################################
# In block form, read a line at a time in a loop. Otherwise,
# just return the number of lines specified as an array.
##############################################################
def tail(max=10)
if block_given?
yield get_lines(max)
loop do
line = @file.gets
yield line unless line.nil?
if @file.eof?
@file.seek(@file.pos)
end
sleep @sleeptime
end
else
get_lines(max)
end
end

#######################################################################
# Grab the last 'max' lines (default is 10). This method was created
# for two reasons. First, even a "tail -f" reads the last 10 lines.
# Second, it's separated from the tail() method in the event the
# programmer just wants a "plain" tail.
#######################################################################
def get_lines(max)

@file.seek(0,IO::SEEK_END)
newline_count = 0
while newline_count < max

begin
@file.pos -= 2
rescue Errno::EINVAL
break
end

break if @file.eof?

if @file.getc.chr == "\n"
newline_count += 1
end
end

@file.readlines
end
end

=begin
= Description
Tail - A pure-Ruby implementation of the 'tail' command. Tail is a
subclass of File.
= Overview
t = Tail.new("somefile.txt")

# To imitate 'tail -f', use a block
t.tail do |line|
puts line
end

# To simply grab the last X lines, don't use a block
a = t.tail(20)
puts a

# or even
t.tail(20).each{ |line| puts line }
=end

Florian Frank

2002-08-01 22:43:57 UTC

Permalink

Post by James F.Hranicky
On Wed, 31 Jul 2002 21:02:04 +0900

Post by Florian Frank
I could try to document a little more in the best english I'm capable
of. That is, it will probably be quite awful. ;)

- Check for rotation by checking for changes in the inode number

Good idea. This should handle the rotation-by-moving case much faster.
I have implemented this.

BTW: If the filesize suddenly shrinks, copy and truncate could have
happened. I'm not sure what has to be done in this case. Rewinding to
the top of the file would perhaps be reasonable, because it doesn't make
much sense for a logfile to be truncated to any other filesize but 0.

Post by James F.Hranicky
- Allow tailing from arbitrary points in the file by lines, e.g
log.wind(10) # skip 10 lines from the beginning, print
# the rest of the file, then tail -f
log.rewind(-10) # print last 10 lines, then tail -f

In my implementation the latter is done by log.last(10).

Post by James F.Hranicky
def wind(lines)
seek(0, IO::SEEK_SET)
numlines = 0
0.upto(stat.size) { |filepos|
seek(filepos, IO::SEEK_SET)
return if (numlines == lines)
i = getc
c = sprintf("%c", i)
numlines += 1 if (c == "\n")
}
end

Maybe I am missing somtehing, but couldn't this be done much simpler
like this:

def wind(lines)

@fileh.seek(0, IO::SEEK_SET) # just to be sure

until @fileh.eof? or lines <= 0
@fileh.readline
lines -= 1
end
end

My approach is much more complicated because I use a buffer of an
arbitrary size, to spare some seek-calls and to have fewer
explicit iterations. Perhaps it doesn't make much of a difference and I
could use this simpler method. I should probably benchmark both methods
to find this out.

James F.Hranicky

2002-08-02 14:46:50 UTC

Permalink

On Fri, 2 Aug 2002 07:43:57 +0900

Post by Florian Frank
BTW: If the filesize suddenly shrinks, copy and truncate could have
happened. I'm not sure what has to be done in this case. Rewinding to
the top of the file would perhaps be reasonable, because it doesn't make
much sense for a logfile to be truncated to any other filesize but 0.

This makes sense to me.

Post by Florian Frank

Post by James F.Hranicky
def wind(lines)

[ ... ]

Post by Florian Frank
Maybe I am missing somtehing, but couldn't this be done much simpler
def wind(lines)

[use readlines]

Post by Florian Frank
end
end

No, that's much better. I was thinking in terms of doing the opposite of
what I did for rewind, missing the easier solution.

Post by Florian Frank

Post by James F.Hranicky
def rewind(lines)

[ ... ]

Post by Florian Frank

Post by James F.Hranicky
end

If you want to tail beginning at an arbitrary position in the file,
that will work, but many will probably want to specify the # of lines
from the end.

You could seek to the end, then seek backwards in chunks, read in each
chunk, then count backwards through the chunk counting newlines and
keeping track of filepos, and once you hit the # lines you want, seek to
that position and then read from there. This would cut down on the #
of seeks and reads in my method above, probably resulting in much
better performance.

Jim

Berger, Daniel

2002-08-02 14:55:42 UTC

Permalink

-----Original Message-----

<snip>

If you want to tail beginning at an arbitrary position in the file,
that will work, but many will probably want to specify the # of lines
from the end.

I agree. I don't think the *nix tail command even allows you to tail from a
specific byte position, though I could be wrong. I can't see much use for
that myself.

You could seek to the end, then seek backwards in chunks, read in each
chunk, then count backwards through the chunk counting newlines and
keeping track of filepos, and once you hit the # lines you
want, seek to
that position and then read from there. This would cut down on the #
of seeks and reads in my method above, probably resulting in much
better performance.

I proposed this in my last solution, though it doesn't grab a block of data
first. I know the source for tail does this, but if we're counting char by
char anyway, how does first grabbing a 4k block (or whatever) help?

# max default is 10
def get_lines(max)

@fh.seek(0,IO::SEEK_END)
newline_count = 0
while newline_count < max

begin
@fh.pos -= 2
rescue Errno::EINVAL
break
end

break if @fh.eof?

if @fh.getc.chr == "\n"
newline_count += 1
end
end

@fh.readlines
end

Regards,

Dan

James F.Hranicky

2002-08-02 15:04:05 UTC

Permalink

On Fri, 2 Aug 2002 23:55:42 +0900

Post by Berger, Daniel
I proposed this in my last solution, though it doesn't grab a block of data
first. I know the source for tail does this, but if we're counting char by
char anyway, how does first grabbing a 4k block (or whatever) help?

Say you wanted to tail 100,000 lines before your tail -f . Given 80 chars
per line, thats 8,000,000 seeks and reads, vs 2000 seeks and reads for
reading in 4k chunks. The comparison for c == "\n" gets done the same
# of time in each version, so unless reading in 4k chunks with one
seek/read vs reading in the same with 4000 seeks/reads is slower
(low memory, perhaps), I'd think the chunked version should be much
faster.

Or am I off base here?

----------------------------------------------------------------------
| Jim Hranicky, Senior SysAdmin UF/CISE Department |
| E314D CSE Building Phone (352) 392-1499 |
| ***@cise.ufl.edu http://www.cise.ufl.edu/~jfh |
----------------------------------------------------------------------

"Given a choice between a complex, difficult-to-understand, disconcerting
explanation and a simplistic, comforting one, many prefer simplistic
comfort if it's remotely plausible, especially if it involves blaming
someone else for their problems."
-- Bob Lewis, _Infoworld_

Berger, Daniel

2002-08-02 15:14:07 UTC

Permalink

-----Original Message-----
On Fri, 2 Aug 2002 23:55:42 +0900

Post by Berger, Daniel
I proposed this in my last solution, though it doesn't grab

a block of data

Post by Berger, Daniel
first. I know the source for tail does this, but if we're

counting char by

Post by Berger, Daniel
char anyway, how does first grabbing a 4k block (or whatever) help?

Ok. I had it in my head that no one would tail more than 4k anyway, though
now that I think about it, it's obviously a possibility.

Now I'm wondering if there's a good way to dynamically compute a buffer size
for such an operation with each 'grab', rather than a fixed buffer size. Is
that a dumb idea? Am I getting into the "premature optimization" stage?

----------------------------------------------------------------------
| Jim Hranicky, Senior SysAdmin UF/CISE Department |
| E314D CSE Building Phone (352) 392-1499 |
----------------------------------------------------------------------

You're a Gator?! Go Seminoles!! ;-P

Regards,

Dan

James F.Hranicky

2002-08-02 15:31:45 UTC

Permalink

On Sat, 3 Aug 2002 00:14:07 +0900

Post by Berger, Daniel
Ok. I had it in my head that no one would tail more than 4k anyway, though
now that I think about it, it's obviously a possibility.

I do it all the time :->

Post by Berger, Daniel
Now I'm wondering if there's a good way to dynamically compute a buffer size
for such an operation with each 'grab', rather than a fixed buffer size. Is
that a dumb idea? Am I getting into the "premature optimization" stage?

Actually, I was wondering that, too...some kind of percentage of the #
of lines requested? Not sure...

Post by Berger, Daniel
You're a Gator?! Go Seminoles!! ;-P

I grew up in Tallahasse -- go both! :->

Jim

Ned Konz

2002-08-02 15:50:27 UTC

Permalink

Post by Berger, Daniel
Now I'm wondering if there's a good way to dynamically compute a
buffer size for such an operation with each 'grab', rather than a
fixed buffer size. Is that a dumb idea? Am I getting into the
"premature optimization" stage?

file.stat.blksize

will return the block size of the underlying file; multiples of this
would probably be most efficient.

--
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE

Florian Frank

2002-08-03 13:01:54 UTC

Permalink

Post by James F.Hranicky
If you want to tail beginning at an arbitrary position in the file,
that will work, but many will probably want to specify the # of lines
from the end.

Yes, of course. I was referring to my implementation in file/tail:

def last(n = 0, bufsize = 4096)
if n <= 0
seek(0, File::SEEK_END)
return
end
size = stat.size
begin
if bufsize < size
seek(0, File::SEEK_END)
while n > 0 and tell > 0 do
start = tell
seek(-bufsize, File::SEEK_CUR)
buffer = read(bufsize)
n -= buffer.count("\n")
seek(-bufsize, File::SEEK_CUR)
end
else
seek(0, File::SEEK_SET)
buffer = read(size)
n -= buffer.count("\n")
seek(0, File::SEEK_SET)
end
rescue Errno::EINVAL
size = tell
retry
end
pos = -1
while pos and n < 0 # forward if we are too far back
pos = buffer.index("\n", pos + 1)
n += 1
end
seek(pos + 1, File::SEEK_CUR)
end

I'm using buffer.count("\n") to count all the newlines in a buffer. I
didn't want to reverse the string first, because this would not be very
performant either. So I search forward in the buffer to find
the right newline in the last while-loop, if I am too far back in the
file.

Post by James F.Hranicky
You could seek to the end, then seek backwards in chunks, read in each
chunk, then count backwards through the chunk counting newlines and
keeping track of filepos, and once you hit the # lines you want, seek to
that position and then read from there. This would cut down on the #
of seeks and reads in my method above, probably resulting in much
better performance.

Yes. This is pretty similar to my implementation above. I think one
bottleneck in scripting languages exists if you copy lots of data
between the scripting level and the c-level. To do most of the things on
the c-level and then copy the results back at the end is usually much
faster. That's (and to spare a lot of method calls) why I used count
instead of buffer[x]. Perhaps I should waste a few rindex calls to
search the buffer backwards because it probably doesn't make much of
difference in practice.

Continue reading on narkive:

Search results for 'A very basic tail -f implementation' (Questions and Answers)

185

replies

Are you in favor of the MMDA's proposal to revive the odd-even scheme to reduce Metro Manila traffic?

started 2010-10-13 03:20:02 UTC

philippines

replies

Socialism: So let me get this right... Maybe you can help...?

started 2007-05-15 15:55:38 UTC

politics

replies