lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benson Margulies <ben...@basistech.com>
Subject Re: Why does this search fail?
Date Wed, 27 Aug 2014 14:01:55 GMT
Does google actually support "*"?



On Wed, Aug 27, 2014 at 9:54 AM, Milind <milindr@gmail.com> wrote:

> I see.  This is going to be extremely difficult to explain to end users.
> It doesn't work as they would expect.  Some of the tokenizing rules are
> already somewhat confusing.  Their expectation is that it should work the
> way their searches work in Google.
>
> It's difficult enough to recognize that because the period is surrounded by
> a digit and alphabet (as opposed to 2 digits or 2 alphabets), it gets
> tokenized.  So I'd have expected that C0001.DevNm00* would effectively
> become a search for C0001 OR DevNm00*.  But now, because of the presence of
> the wildcard, it's considered as 1 term and the period is not a tokenizer.
> That's actually good, but now the fact that it's still considered as 2
> terms for wildcard searches makes it very unintuitive.  I don't suppose
> that I can do anything about making wildcard search use multiple terms if
> joined together with a tokenizer.  But is there any way that I can force it
> to go through an analyzer prior to doing the search?
>
>
>
>
> On Tue, Aug 26, 2014 at 4:21 PM, Jack Krupansky <jack@basetechnology.com>
> wrote:
>
> > Sorry, but you can only use a wildcard on a single term. "C0001.DevNm001"
> > gets indexed as two terms, "c0001" and "devnm001", so your wildcard won't
> > match any term (at least in this case.)
> >
> > Also, if your query term includes a wildcard, it will not be fully
> > analyzed. Some filters such as lower case are defined as "multi-term", so
> > they will be performed, but the standard tokenizer is not being called,
> so
> > the dot remains and this whole term is treated as one term, unlike the
> > index analysis.
> >
> > -- Jack Krupansky
> >
> > -----Original Message----- From: Milind
> > Sent: Tuesday, August 26, 2014 12:24 PM
> > To: java-user@lucene.apache.org
> > Subject: Why does this search fail?
> >
> >
> > I have a field with the value C0001.DevNm001.  If I search for
> >
> >    C0001.DevNm001 --> Get Hit
> >    DevNm00*       --> Get Hit
> >    C0001.DevNm00*  --> Get No Hit
> >
> > The field gets tokenized on the period since it's surrounded by a letter
> > and and a number.  The query gets evaluated as a prefix query.  I'd have
> > thought that this should have found the document.  Any clues on why this
> > doesn't work?
> >
> > The full code is below.
> >
> >        Directory theDirectory = new RAMDirectory();
> >        Version theVersion = Version.LUCENE_47;
> >        Analyzer theAnalyzer = new StandardAnalyzer(theVersion);
> >        IndexWriterConfig theConfig =
> >                            new IndexWriterConfig(theVersion,
> theAnalyzer);
> >        IndexWriter theWriter = new IndexWriter(theDirectory, theConfig);
> >
> >        String theFieldName = "Name";
> >        String theFieldValue = "C0001.DevNm001";
> >          Document theDocument = new Document();
> >          theDocument.add(new TextField(theFieldName, theFieldValue,
> > Field.Store.YES));
> >          theWriter.addDocument(theDocument);
> >        theWriter.close();
> >
> >        String theQueryStr = theFieldName + ":C0001.DevNm00*";
> >        Query theQuery =
> >            new QueryParser(theVersion, theFieldName,
> > theAnalyzer).parse(theQueryStr);
> >        System.out.println(theQuery.getClass() + ", " + theQuery);
> >        IndexReader theIndexReader = DirectoryReader.open(theDirectory);
> >        IndexSearcher theSearcher = new IndexSearcher(theIndexReader);
> >        TopScoreDocCollector collector = TopScoreDocCollector.create(10,
> > true);
> >        theSearcher.search(theQuery, collector);
> >        ScoreDoc[] theHits = collector.topDocs().scoreDocs;
> >        System.out.println("Hits found: " + theHits.length);
> >
> > Output:
> >
> > class org.apache.lucene.search.PrefixQuery, Name:c0001.devnm00*
> > Hits found: 0
> >
> >
> > --
> > Regards
> > Milind
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
>
>
> --
> Regards
> Milind
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message