lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lingam, ChandraMohan J" <chandramohan.j.lin...@intel.com>
Subject RE: SPAM-HIGH: Disparity between API usage and Luke
Date Wed, 27 Jun 2012 03:28:02 GMT
Interestingly, the query generated from this var query = queryParser.Parse("Id:BAUER*") is
converted to lower case "bauer*" eventhough you are using KeywordAnalyzer.  I am not sure
if this is the intended behavior of the keyword analyzer.

So, best option to make this example work is to index in lowercase:
            document.Add(new Field("Id", "bauerrevenue", Field.Store.YES, Field.Index.NOT_ANALYZED));

Also, the assert will always fail because hit count even when it matches will be 1 since there
is only one document with several values associated with the field.  You would need to iterate
thru the fields.  If you want to match 6 documents, then you have to add as six separate documents
instead one document will all the values.




-----Original Message-----
From: Rob Cecil [mailto:rob.cecil@gmail.com] 
Sent: Tuesday, June 26, 2012 6:55 PM
To: lucene-net-user@lucene.apache.org
Subject: Re: SPAM-HIGH: Disparity between API usage and Luke

Sure, this is self-contained:

[Test]
        public void QueryNonAnalyzedField()
        {
            var indexPath = Path.Combine(Environment.CurrentDirectory,
"testindex");
            var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
            var analyzer = new KeywordAnalyzer();
            var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.LIMITED);
            var document = new Document();
            document.Add(new Field("Id", "BAUERREVENUE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERLOCATION", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCT", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERPRODUCTLINE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERSTATE", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "BAUERTOTAL", Field.Store.YES, Field.Index.NOT_ANALYZED));
            document.Add(new Field("Id", "NOTBAUER", Field.Store.YES, Field.Index.NOT_ANALYZED));
            writer.AddDocument(document);
            writer.Optimize();
            writer.Close();

            IndexReader reader = IndexReader.Open(directory, true);
            var queryParser = new QueryParser(Version.LUCENE_29, "content", analyzer);
            var query = queryParser.Parse("Id:BAUER*");
            var indexSearch = new IndexSearcher(reader);
            var hits = indexSearch.Search(query);
            Assert.AreEqual(6, hits.Length());
        }


On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J < chandramohan.j.lingam@intel.com>
wrote:

> Just did a simple test and Keywordanalyzer does indeed work like a 
> prefix query if you put a star at the end. Agree with Simon.  Most 
> likely luke was using keyword analyzer and somehow UI was not reflecting it?
>
> Please post a small snippet of your index code and query code...
>
> -----Original Message-----
> From: Rob Cecil [mailto:rob.cecil@gmail.com]
> Sent: Tuesday, June 26, 2012 5:25 PM
> To: lucene-net-user@lucene.apache.org
> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>
> Thanks, and there is no equivalent QueryParser syntax for that?
>
> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J < 
> chandramohan.j.lingam@intel.com> wrote:
>
> > actually, that makes sense. Keyword analyzer would try for an exact
> match.
> >  Since you are looking for prefix based search, your best option is 
> > to simply use PrefixQuery and there is no need to put a "*" for prefixquery.
> >
> > -----Original Message-----
> > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > Sent: Tuesday, June 26, 2012 4:57 PM
> > To: lucene-net-user@lucene.apache.org
> > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> >
> > That is correct. I've verified in Luke 1.0.1 that both analyzers 
> > produce the same results.
> >
> > To make it interesting, back in my code, I switched over to using 
> > the KeywordAnalyzer, and I'm still not getting any results against 
> > that NOT_ANALYZED field.
> >
> > ?
> >
> > On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J < 
> > chandramohan.j.lingam@intel.com> wrote:
> >
> > > Luke using keyword analyzer as default makes sense. However, in 
> > > the original post, there was a link to luke output screenshot 
> > > which showed that standard analyzer was in use for query parsing.
> > >
> > > -----Original Message-----
> > > From: Simon Svensson [mailto:sisve@devhost.se]
> > > Sent: Tuesday, June 26, 2012 2:56 PM
> > > To: lucene-net-user@lucene.apache.org
> > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > >
> > > Luke defaults to KeywordAnalyzer which wont change your term in 
> > > any
> way.
> > > The QueryParser will still break up your query, so "Name:Jack Bauer"
> > > would become (Name:Jack DefaultField:Bauer). I believe you can 
> > > have per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer 
> > > for everything else) using a PerFieldAnalyzerWrapper.
> > >
> > > On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
> > > > QueryParser has no knowledge of how data was indexed.  For your
> > > scenario, I don't believe you would be able to use Query Parser 
> > > with standard analyzer when data was originally indexed with 
> > > Field.Index.NOT_ANALYZED option.
> > > >
> > > > Interesting question is why is luke working/finding the match?  
> > > > I would
> > > have expected Luke to not find any matches.
> > > >
> > > >
> > > > -----Original Message-----
> > > > From: Rob Cecil [mailto:rob.cecil@gmail.com]
> > > > Sent: Tuesday, June 26, 2012 12:54 PM
> > > > To: lucene-net-user@lucene.apache.org
> > > > Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
> > > >
> > > > I can definitely try that. I just expected QueryParser would 
> > > > respect the
> > > case of the source string. I was hoping to avoid using the Query 
> > > API per-se, and just let the parser to the work for me.
> > > >
> > > > On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
> > > chandramohan.j.lingam@intel.com> wrote:
> > > >
> > > >>>> var query = _parser.Parse("Id:BAUER*");
> > > >> In your code, most likely, the value got converted to lower 
> > > >> case
> (i.e.
> > > >> bauer*) by the parse statement.
> > > >> Whereas indexed value is in upper case as it is not analyzed 
> > > >> (from screen shot).
> > > >>
> > > >> Can you explicitly try using prefix query?
> > > >>
> > > >>
> > > >>
> > > >>> Same results, apparently, when I use Luke 1.0.1.
> > > >>>
> > > >>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my

> > > >>> custom app, zero.
> > > >>>
> > > >>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse 
> > > >>> <rvesse@dotnetrdf.org>
> > > >> wrote:
> > > >>>> You appear to be using Luke 3.5 which per the information
on 
> > > >>>> the Luke homepage (http://code.google.com/p/luke/) uses 
> > > >>>> Lucene
> > > >>>> 3.5
> > > >>>>
> > > >>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be 
> > > >>>> surprised to see different behavior between the API and executing
in Luke.
> > > >>>>
> > > >>>> If you use a version of Luke which more closely aligns with

> > > >>>> the version
> > > >>> of
> > > >>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be 
> > > >>>> close enough since the 2.9.x releases were previews of the

> > > >>>> 3.0.x releases as I understood it) what behavior do you see?
> > > >>>>
> > > >>>> Hope this helps,
> > > >>>>
> > > >>>> Rob
> > > >>>>
> > > >>>> On 6/26/12 10:50 AM, "Rob Cecil" <rob.cecil@gmail.com>
wrote:
> > > >>>>
> > > >>>>> If I run a query against my index using QueryParser to
query 
> > > >>>>> a
> > field:
> > > >>>>>
> > > >>>>>                 var query = _parser.Parse("Id:BAUER*");
> > > >>>>>                 var topDocs = searcher.Search(query, 10);
> > > >>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
> > > >>>>>
> > > >>>>> I get 0 for my TotalHits, yet in Luke, the same query
phrase 
> > > >>>>> yields
> > > >>>>> 15 results, what am I doing wrong? I use the 
> > > >>>>> StandardAnalyzer both to create the index and to query.
> > > >>>>>
> > > >>>>> The field is defined as:
> > > >>>>>
> > > >>>>> new Field("Id", myObject.Id, Field.Store.YES,
> > > >>>>> Field.Index.NOT_ANALYZED)
> > > >>>>>
> > > >>>>> and is a string field. The result set back from Luke looks

> > > >>>>> like
> > > >>>>> (screencap):
> > > >>>>>
> > > >>>>> http://screencast.com/t/NooMK2Rf
> > > >>>>>
> > > >>>>> Thanks!
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > >
> > >
> > >
> >
>

Mime
View raw message