Discussion:
[ruby-core:87574] Yes idiom to shorten a long regex
Benoit Daloze
2018-06-21 15:33:19 UTC
Permalink
Hello,

This is a question likely best suited for ruby-talk (which I forward this
email to).

Anyway, one way to deduplicate is use Regexp interoplation:

NUMBER = /(?:\d+\.\d+)/
LONG_REGEXP = /^#{NUMBER}: I:\d+\s+\(\s+#{NUMBER}.../
Hello,
I have some data that I would like to parse with a regex. The raw data
9.028: I:4551 ( 0.095 0.096 0.136 ) T:4551 ( 0.095 0.096 0.098
0.117 0.136 0.136 )
14.066: I:4601 ( 0.095 0.096 5.344 ) T:9152 ( 0.095 0.096
0.098 0.119 4.352 5.344 )
19.099: I:4609 ( 0.094 0.096 0.132 ) T:13761 ( 0.094 0.096
0.098 0.123 4.352 5.344 )
24.033: I:4528 ( 0.093 0.095 0.130 ) T:18289 ( 0.094 0.096
0.098 0.124 3.344 5.344 )
^(\d+\.\d+): I:\d+\s+\(\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+\).+\(\
s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.
\d+)\s+(\d+\.\d+)\s+\)$
^(?<offset_secs>\d+\.\d+): I:\d+\s+\(\s+(?<median_stall>\
d+\.\d+)\s+(?<p90_stall>\d+\.\d+)\s+(?<max_stall>\d+\.\d+)\
s+\).+\(\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)
\s+(\d+\.\d+)\s+(\d+\.\d+)\s+\)$
But that's a long ugly regex that repeats the {capture+whitespace} element
*(\d+\.\d+)\s+* three times, then six times
Is there a Ruby regex way to simplify/clarify the regex so that it
explicitly shows that the capture+whitespace is repeated three times then
six ?
thanks,
Peter
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>
Andy Jones
2018-06-21 15:56:02 UTC
Permalink
I can extract the data that I want with the following:

^(\d+\.\d+): I:\d+\s+\(\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+\).+\(\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+\)$
<<<<<<<<

Well, you know what they say: I used to have a problem, then I use Regex. Now I have two problems


Personally, I would split the parsing operation up into separate statements. Given a string like:

0.095 0.096 0.136


Ruby is more than capable of parsing this into numbers using String#split.

So I would be tempted to parse your line with something like /^(.*?): I:(.*?) \((.*?)\) T:(.*?) \(.*?)/ and then deal with the four strings that result afterwards.

(Please forgive my Regexp, entirely untested and I’m sure you are better at that than I am.)


Click here to view Company Information and Confidentiality Notice.<http://www.jameshall.co.uk/index.php/small-print/email-disclaimer>

Please note that we have updated our privacy policy in line with new data protection regulations. Please refer to our website to view the ways in which we handle your data.
Robert Klemme
2018-07-04 06:33:26 UTC
Permalink
Post by Andy Jones
^(\d+\.\d+): I:\d+\s+\(\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+\).+\(\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+(\d+\.\d+)\s+\)$
Well, you know what they say: I used to have a problem, then I use Regex. Now I have two problems…
Depends on how good your regex foo is, I'd say.
Post by Andy Jones
0.095 0.096 0.136
…Ruby is more than capable of parsing this into numbers using String#split.
So I would be tempted to parse your line with something like /^(.*?): I:(.*?) \((.*?)\) T:(.*?) \(.*?)/ and then deal with the four strings that result afterwards.
Another approach: use a short regex to ensure the line conforms to the
expected format (not shown below) and then use #scan with a negative
lookbehind to exclude the number after "I:" and "T:":

$ cat x
9.028: I:4551 ( 0.095 0.096 0.136 ) T:4551 ( 0.095 0.096
0.098 0.117 0.136 0.136 )
14.066: I:4601 ( 0.095 0.096 5.344 ) T:9152 ( 0.095 0.096
0.098 0.119 4.352 5.344 )
19.099: I:4609 ( 0.094 0.096 0.132 ) T:13761 ( 0.094 0.096
0.098 0.123 4.352 5.344 )
24.033: I:4528 ( 0.093 0.095 0.130 ) T:18289 ( 0.094 0.096
0.098 0.124 3.344 5.344 )
$ irb
irb(main):001:0> File.foreach("x") {|l| p l.scan(/(?!<:)\d+\.\d+/).map(&:to_f)}
[9.028, 0.095, 0.096, 0.136, 0.095, 0.096, 0.098, 0.117, 0.136, 0.136]
[14.066, 0.095, 0.096, 5.344, 0.095, 0.096, 0.098, 0.119, 4.352, 5.344]
[19.099, 0.094, 0.096, 0.132, 0.094, 0.096, 0.098, 0.123, 4.352, 5.344]
[24.033, 0.093, 0.095, 0.13, 0.094, 0.096, 0.098, 0.124, 3.344, 5.344]
=> nil

Kind regards

robert
--
[guy, jim, charlie].each {|him| remember.him do |as, often| as.you_can
- without end}
http://blog.rubybestpractices.com/

Unsubscribe: <mailto:ruby-talk-***@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-la
Loading...