jakarta-oro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noel J. Bergman" <n...@devtech.com>
Subject RE: Wanted: Regex[Input|Output]Stream
Date Sat, 20 Sep 2003 03:07:00 GMT

Sorry for the time gap, but I've been consumed on some other projects, and
am just getting back to this topic.  I've two issues that I need to resolve.

  (1) I have data arriving on a feed that I want to
      read and process, but I would like to check it
      against multiple expressions (see 2) in real-
      time and recognize when one has been matched.

  (2) I need to have multiple expressions, and am
      not seeing support for that in ORO or any of
      the other Java regex packages.  Processing a
      gigabyte of data for every 1 megabyte of real
      data because I have 1K expressions does not
      seem overly efficient.

If there is a ready-made solution to (2), please let me know.  Or is the
easy way to do avoid searching for:

  R1, R2, R3, ... Rn

to do:


(yes, I realize that a limitation is that I wouldn't get to know which Rn
has matched) and what is ORO's limit?  Or do you have a better way that I'm

An example of (1) is doing real-time pattern matching in the incoming stream
of a mail server while we process the data.  Now, another way of doing it
would be for me to use push-processing (think java.nio) and push blocks of
data into the regex matcher before pushing the data into the protocol
handler.  Yes, to answer one of your other messages, I would want matching
to continue from where it left off.  If a Listener approach is used, I just
want it to fire off RegexNotificationEvent as a pattern is recognized.

The FSA should assume that there is more data to arrive until told
otherwise.  And, yes, the matcher would have to preserve the state of its
FSA.  With respect to NFA vs DFA, for all NFA there exists an equivalent
DFA, albeit with quite a few more states, but perhaps I am missing your

One other stipulation.  I am assuming a long-lived recognizer with a lot of
data, so I am quite willing to expend more cycles up front to compile a
relatively optimized FSA in exchange for efficient processing during the
main duty cycle.

Does this better explain what I'm looking for, and where I'm coming from?

	--- Noel


View raw message