1. Because the doubled-word problem must work even when the doubled
words aresplit across lines, I can’t use the normal line-by-line processing I
used with the mail utility example. Setting the special variable$/(yes, that’s
avariable) as shown puts the subsequent<> into a magic mode such that it
retur nsnot single lines, but more-or-less paragraph-sized chunks. The value
retur ned is just one string, but a string that could potentially contain many of
what we would consider to be logical lines.
2.Did you notice that I don’t assign the value from<>to anything? When used
as the conditional of awhile like this,<>magically assigns the string to a
special default variable.†That same variable holds the default string that
s/˙˙˙/˙˙˙/works on, and thatprintdisplays. Using these defaults makes the
pr ogram less cluttered, but also less understandable to someone new to the
language, so I recommend using explicit operands until you’recomfortable.
3.Thenext unlessbefor ethe substitute command has Perl abort processing on
the current string (to continue with the next) if the substitution doesn’t actu-
ally do anything. There’s no need to continue working on a string in which
no doubled words arefound.
4.The replacement string is really just”$1 $2 $3″with interveningANSIescape
sequences that provide highlighting to the two doubled words, but not to
whatever separates them. These escape sequences are\e[7m to begin high-
lighting, and\e[m to end it. (\e is Perl’s regex and string shorthand for the
ASCIIescape character,which begins theseANSIescape sequences.)
Looking at how the parentheses in the regex are laid out, you’ll realize that
"$1$2$3" repr esentsexactly what was matched in the ﬁrst place. So, other
than adding in the escape sequences, this whole substitute command is
essentially a (slow) no-op.
We know that$1and$3 repr esentmatches of the same word (the whole
point of the program!), so I could probably get by with using just one or the
other in the replacement. However,since they might differ in capitalization, I
use both variables explicitly.
5.The string may contain many logical lines, but once the substitution has
marked all the doubled words, we want to keep only logical lines that have
an escape character.Removing those that don’t leaves only the lines of inter-
est in the string. Since we used the enhanced line anchor match mode (the
/mmodiﬁer) with this substitution, the regex !ˆ([ˆ\e]+\n)+ “can ﬁnd logical
lines of non-escapes. Use of this regex in the substitute causes those
sequences to be removed. The result is that only logical lines that have an
escape remain, which means that only logical lines that have doubled words
in them remain.‡
6.The variable$ARGVmagically provides the name of the input ﬁle. Combined
with/mand/g, this substitution tacks the input ﬁlename to the beginning of
each logical line remaining in the string.Cool!
Finally, theprintspits out what’s left of the string, escapes and all. Thewhile
loop repeats the same processing for all the strings (paragraph-sized chunks of
text) that areread from the input.
- Note: this is from book <Mastering Regular Expressions>