lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Spaces in regular expressions
Date Sat, 13 Feb 2016 18:34:24 GMT
Obviously you wouldn't need to do a regex for simply terms like foo and bar
- just use simple terms and quoted phrase to match "foo bar". If you really
do need to do complex pattern regexes and match across adjacent terms, your
best bet is to keep a copy of the source text in a separate string (not
tokenized text) field and then you can do a complex regex that spans terms
(and only do that if normal span queries don't do what you need.)

What does your typical cross-term regex actually look like?


-- Jack Krupansky

On Sat, Feb 13, 2016 at 1:25 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> That's very easy to explain: Regexp queries only work on terms, you
> already said it in your introduction. There is no phrase query in Lucene
> that accepts regular expressions.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Kudrettin Güleryüz [mailto:kudrettin@gmail.com]
> > Sent: Saturday, February 13, 2016 7:14 PM
> > To: java-user@lucene.apache.org
> > Subject: Spaces in regular expressions
> >
> > Hello,
> >
> > I am using standard whitespace analyzer to index a source code document
> > using Lucene 5.
> >
> > I understand that a document with content foo bar would have only two
> > terms: foo and bar.  When I search for "foo bar" it normally matches the
> > document. Similarly a regexp query /foo/ or /bar/ also matches the
> > document.
> >
> > Can you help me understand why doesn't a regexp query like /foo bar/
> > doesn't match the document?
> >
> > Thank you,
> > Kudret
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message