lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Multiline Regex with Lucene
Date Tue, 28 Jul 2009 16:25:26 GMT
I doubt you're thinking in terms of tokens. Your inputstream is broken up
into tokens (think of them as words,
depending upon the analyzer) and regex searchers are
confined to those *tokens*. So the concept of a multi-line
regex in a search is kind of ...odd...

You could possibly index your input as UN_TOKENIZED, but
I really have no clue what Lucene would do with that. I think
you're off in uncharted territory here.

Perhaps a better thing would be for you to explain *why* you
want to do this and perhaps folks can come up with some
suggestions, I suspect this may be an XY problem, see
http://www.perlmonks.org/index.pl?node_id=542341

Best
Erick

On Sun, Jul 26, 2009 at 9:52 AM, ba3 <sbadhrinath@gmail.com> wrote:

>
> I was trying to do a regex search with the lucene and
> JavaUtilRegexCapabilities.
> The code used is :
> RegexQuery query = new RegexQuery(new
> Term("contents","(?m)hello.*(\r[^#]*)This is to be searched.*(\r[^#]*)#"));
> query.setRegexImplementation(new JavaUtilRegexCapabilities());
>
> I verified the regex in : http://www.gskinner.com/RegExr/  [with the multi
> line checked]
> In lucene though there are no hits. Can you please point me in the right
> direction
>
> -- Rgds
> Ba3
>
> Regex :
> hello.*(\r[^#]*)This is to be searched.*(\r[^#]*)#
>
> Content :
> hello world
> This is to be searched
> #
> Test line should not be selected
> hello
> This should not work
> some other lines
> #
> Not to be selected
> hello world
> Some lines
> This is to be searched
> Some lines
> #
> hello earth
> some lines
> #
> --
> View this message in context:
> http://www.nabble.com/Multiline-Regex-with-Lucene-tp24667109p24667109.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message