lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kudrettin Güleryüz <kudret...@gmail.com>
Subject Re: Spaces in regular expressions
Date Sat, 13 Feb 2016 21:29:37 GMT
 As mentioned, document is a source code. As you know all below statments
are equal:
A = foo() {
A=foo(){
A= foo(){
...

With standard whitespace analyzer in action statements wanted to match can
be on one to five terms in this case. If spacing is definite, I could go
either a phrase search or regexep. Any suggestions for this case?



On Sat, Feb 13, 2016 at 1:34 PM Jack Krupansky <jack.krupansky@gmail.com>
wrote:

> Obviously you wouldn't need to do a regex for simply terms like foo and bar
> - just use simple terms and quoted phrase to match "foo bar". If you really
> do need to do complex pattern regexes and match across adjacent terms, your
> best bet is to keep a copy of the source text in a separate string (not
> tokenized text) field and then you can do a complex regex that spans terms
> (and only do that if normal span queries don't do what you need.)
>
> What does your typical cross-term regex actually look like?
>
>
> -- Jack Krupansky
>
> On Sat, Feb 13, 2016 at 1:25 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>
> > Hi,
> >
> > That's very easy to explain: Regexp queries only work on terms, you
> > already said it in your introduction. There is no phrase query in Lucene
> > that accepts regular expressions.
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> > > -----Original Message-----
> > > From: Kudrettin Güleryüz [mailto:kudrettin@gmail.com]
> > > Sent: Saturday, February 13, 2016 7:14 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Spaces in regular expressions
> > >
> > > Hello,
> > >
> > > I am using standard whitespace analyzer to index a source code document
> > > using Lucene 5.
> > >
> > > I understand that a document with content foo bar would have only two
> > > terms: foo and bar.  When I search for "foo bar" it normally matches
> the
> > > document. Similarly a regexp query /foo/ or /bar/ also matches the
> > > document.
> > >
> > > Can you help me understand why doesn't a regexp query like /foo bar/
> > > doesn't match the document?
> > >
> > > Thank you,
> > > Kudret
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message