lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind <mili...@gmail.com>
Subject Re: Why does this search fail?
Date Wed, 27 Aug 2014 14:51:21 GMT
Yes.  If you search for alphare on google and alphare*, you get 2 different
results.  Sorry for the contrived example.  I just tried searching for
alpharetta and went backwards deleting characters.


On Wed, Aug 27, 2014 at 10:01 AM, Benson Margulies <benson@basistech.com>
wrote:

> Does google actually support "*"?
>
>
>
> On Wed, Aug 27, 2014 at 9:54 AM, Milind <milindr@gmail.com> wrote:
>
> > I see.  This is going to be extremely difficult to explain to end users.
> > It doesn't work as they would expect.  Some of the tokenizing rules are
> > already somewhat confusing.  Their expectation is that it should work the
> > way their searches work in Google.
> >
> > It's difficult enough to recognize that because the period is surrounded
> by
> > a digit and alphabet (as opposed to 2 digits or 2 alphabets), it gets
> > tokenized.  So I'd have expected that C0001.DevNm00* would effectively
> > become a search for C0001 OR DevNm00*.  But now, because of the presence
> of
> > the wildcard, it's considered as 1 term and the period is not a
> tokenizer.
> > That's actually good, but now the fact that it's still considered as 2
> > terms for wildcard searches makes it very unintuitive.  I don't suppose
> > that I can do anything about making wildcard search use multiple terms if
> > joined together with a tokenizer.  But is there any way that I can force
> it
> > to go through an analyzer prior to doing the search?
> >
> >
> >
> >
> > On Tue, Aug 26, 2014 at 4:21 PM, Jack Krupansky <jack@basetechnology.com
> >
> > wrote:
> >
> > > Sorry, but you can only use a wildcard on a single term.
> "C0001.DevNm001"
> > > gets indexed as two terms, "c0001" and "devnm001", so your wildcard
> won't
> > > match any term (at least in this case.)
> > >
> > > Also, if your query term includes a wildcard, it will not be fully
> > > analyzed. Some filters such as lower case are defined as "multi-term",
> so
> > > they will be performed, but the standard tokenizer is not being called,
> > so
> > > the dot remains and this whole term is treated as one term, unlike the
> > > index analysis.
> > >
> > > -- Jack Krupansky
> > >
> > > -----Original Message----- From: Milind
> > > Sent: Tuesday, August 26, 2014 12:24 PM
> > > To: java-user@lucene.apache.org
> > > Subject: Why does this search fail?
> > >
> > >
> > > I have a field with the value C0001.DevNm001.  If I search for
> > >
> > >    C0001.DevNm001 --> Get Hit
> > >    DevNm00*       --> Get Hit
> > >    C0001.DevNm00*  --> Get No Hit
> > >
> > > The field gets tokenized on the period since it's surrounded by a
> letter
> > > and and a number.  The query gets evaluated as a prefix query.  I'd
> have
> > > thought that this should have found the document.  Any clues on why
> this
> > > doesn't work?
> > >
> > > The full code is below.
> > >
> > >        Directory theDirectory = new RAMDirectory();
> > >        Version theVersion = Version.LUCENE_47;
> > >        Analyzer theAnalyzer = new StandardAnalyzer(theVersion);
> > >        IndexWriterConfig theConfig =
> > >                            new IndexWriterConfig(theVersion,
> > theAnalyzer);
> > >        IndexWriter theWriter = new IndexWriter(theDirectory,
> theConfig);
> > >
> > >        String theFieldName = "Name";
> > >        String theFieldValue = "C0001.DevNm001";
> > >          Document theDocument = new Document();
> > >          theDocument.add(new TextField(theFieldName, theFieldValue,
> > > Field.Store.YES));
> > >          theWriter.addDocument(theDocument);
> > >        theWriter.close();
> > >
> > >        String theQueryStr = theFieldName + ":C0001.DevNm00*";
> > >        Query theQuery =
> > >            new QueryParser(theVersion, theFieldName,
> > > theAnalyzer).parse(theQueryStr);
> > >        System.out.println(theQuery.getClass() + ", " + theQuery);
> > >        IndexReader theIndexReader = DirectoryReader.open(theDirectory);
> > >        IndexSearcher theSearcher = new IndexSearcher(theIndexReader);
> > >        TopScoreDocCollector collector = TopScoreDocCollector.create(10,
> > > true);
> > >        theSearcher.search(theQuery, collector);
> > >        ScoreDoc[] theHits = collector.topDocs().scoreDocs;
> > >        System.out.println("Hits found: " + theHits.length);
> > >
> > > Output:
> > >
> > > class org.apache.lucene.search.PrefixQuery, Name:c0001.devnm00*
> > > Hits found: 0
> > >
> > >
> > > --
> > > Regards
> > > Milind
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > Regards
> > Milind
> >
>



-- 
Regards
Milind

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message