commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject Re: Proposed Contribution to Apache Commons,
Date Sat, 24 Oct 2015 15:14:57 GMT
My colleague, Jeff Rothenberg, and I are retired computer scientists and are
no strangers to regular expression theory and practice. Both of us have used
regular expressions for decades and have taught many other programmers how to
use them. Stephen Kleene (,
the inventor of regular expressions and I
( were both doctoral students of
Alonzo Church ( Rothenberg used
SNOBOL3 and SNOBOL4 (more powerful than all but a few of the most recent
versions of regular expressions) extensively in his graduate work in
Artificial Intelligence in the late 1960 and early 1970s.

In our experience, although skilled programmers can write regular expressions
that solve a wide range of problems, for all but the simplest tasks regular
expressions quickly become "write only". That is, once they have aged for a
while, no one other than their authors (and, in our experience, often not even
they) can understand them well enough to verify, modify, debug, or maintain
them without considerable effort. Analogous low-level programming formalisms,
such as machine code and assembly language, have been replaced by
higher-level, more readable and modular languages to produce programs that
have proven easier and more cost-effective to debug, verify, maintain, reuse,
and extend.

In a similar fashion, Naomi is a means of "taming" complex regular
expressions, as well as offering an easier alternative for those who are
unfamiliar with them. Naomi makes pattern matching programs more readable,
modular, and therefore verifiable, maintainable, and extensible. Naomi
ultimately generates regular expressions, and it can do everything they can
do, but it provides a higher-level API that uses object-oriented constructs to
define complex, modular, parameterized patterns and subpatterns.

Naomi's advantages over bare regular expressions become apparent only for
larger scale pattern matching tasks. Whereas regular expressions are highly
compact and terse, this virtue becomes a vice for complex patterns. Coupled
with the extensive use of metacharacters and escape sequences, this makes even
moderately complex regular expressions effectively unreadable for all but the
most experienced and practiced regular expression programmers. Newer features
that go beyond the original regular expression formalism--such as namable
groups, built-in names for common character classes, comments, and free white
space--make regular expressions less terse. But their use is not enough to
render complex regular expressions easily readable. These extensions are
analogous to replacing binary machine language by assembly language coding. It
is only necessary to consider a complex problem--such as that of parsing the
e-mail date-time specification of RFC 2822 in src/ appreciate
the obscurity of regular expressions and to understand Naomi's advantages.

    Norman Shapiro

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message