lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Spaces in regular expressions
Date Sat, 13 Feb 2016 22:09:39 GMT
Just to be clear, the whitespace tokenizer would treat "A=foo(){" as a
single token. I presume you want "A" and "foo" to be separate terms.

You still haven't indicated what regex you were considering. Try explaining
your query in plain English. I mean, do you want to search for two keywords
with any operator sequence between them? Or... do you want to match on
operators as well but simply want to ignore whitespace?

Generally, the standard analyzer/tokenizer is better/easier - you can
simply query "A foo" and it will match all three of you statements.

-- Jack Krupansky

On Sat, Feb 13, 2016 at 4:29 PM, Kudrettin Güleryüz <kudrettin@gmail.com>
wrote:

>  As mentioned, document is a source code. As you know all below statments
> are equal:
> A = foo() {
> A=foo(){
> A= foo(){
> ...
>
> With standard whitespace analyzer in action statements wanted to match can
> be on one to five terms in this case. If spacing is definite, I could go
> either a phrase search or regexep. Any suggestions for this case?
>
>
>
> On Sat, Feb 13, 2016 at 1:34 PM Jack Krupansky <jack.krupansky@gmail.com>
> wrote:
>
> > Obviously you wouldn't need to do a regex for simply terms like foo and
> bar
> > - just use simple terms and quoted phrase to match "foo bar". If you
> really
> > do need to do complex pattern regexes and match across adjacent terms,
> your
> > best bet is to keep a copy of the source text in a separate string (not
> > tokenized text) field and then you can do a complex regex that spans
> terms
> > (and only do that if normal span queries don't do what you need.)
> >
> > What does your typical cross-term regex actually look like?
> >
> >
> > -- Jack Krupansky
> >
> > On Sat, Feb 13, 2016 at 1:25 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > Hi,
> > >
> > > That's very easy to explain: Regexp queries only work on terms, you
> > > already said it in your introduction. There is no phrase query in
> Lucene
> > > that accepts regular expressions.
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > > > -----Original Message-----
> > > > From: Kudrettin Güleryüz [mailto:kudrettin@gmail.com]
> > > > Sent: Saturday, February 13, 2016 7:14 PM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Spaces in regular expressions
> > > >
> > > > Hello,
> > > >
> > > > I am using standard whitespace analyzer to index a source code
> document
> > > > using Lucene 5.
> > > >
> > > > I understand that a document with content foo bar would have only two
> > > > terms: foo and bar.  When I search for "foo bar" it normally matches
> > the
> > > > document. Similarly a regexp query /foo/ or /bar/ also matches the
> > > > document.
> > > >
> > > > Can you help me understand why doesn't a regexp query like /foo bar/
> > > > doesn't match the document?
> > > >
> > > > Thank you,
> > > > Kudret
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message