lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Svensson <si...@devhost.se>
Subject Re: SPAM-HIGH: Disparity between API usage and Luke
Date Wed, 27 Jun 2012 05:44:48 GMT
Set queryParser.SetLowercaseExpandedTerms(false);

On 2012-06-27 03:55, Rob Cecil wrote:
> Sure, this is self-contained:
>
> [Test]
>          public void QueryNonAnalyzedField()
>          {
>              var indexPath = Path.Combine(Environment.CurrentDirectory,
> "testindex");
>              var directory = FSDirectory.Open(new DirectoryInfo(indexPath));
>              var analyzer = new KeywordAnalyzer();
>              var writer = new IndexWriter(directory, analyzer, true,
> IndexWriter.MaxFieldLength.LIMITED);
>              var document = new Document();
>              document.Add(new Field("Id", "BAUERREVENUE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERLOCATION",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERPRODUCT",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERPRODUCTLINE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERSTATE",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "BAUERTOTAL",
> Field.Store.YES, Field.Index.NOT_ANALYZED));
>              document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
> Field.Index.NOT_ANALYZED));
>              writer.AddDocument(document);
>              writer.Optimize();
>              writer.Close();
>
>              IndexReader reader = IndexReader.Open(directory, true);
>              var queryParser = new QueryParser(Version.LUCENE_29,
> "content", analyzer);
>              var query = queryParser.Parse("Id:BAUER*");
>              var indexSearch = new IndexSearcher(reader);
>              var hits = indexSearch.Search(query);
>              Assert.AreEqual(6, hits.Length());
>          }
>
>
> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
> chandramohan.j.lingam@intel.com> wrote:
>
>> Just did a simple test and Keywordanalyzer does indeed work like a prefix
>> query if you put a star at the end. Agree with Simon.  Most likely luke was
>> using keyword analyzer and somehow UI was not reflecting it?
>>
>> Please post a small snippet of your index code and query code...
>>
>> -----Original Message-----
>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>> Sent: Tuesday, June 26, 2012 5:25 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>
>> Thanks, and there is no equivalent QueryParser syntax for that?
>>
>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.com> wrote:
>>
>>> actually, that makes sense. Keyword analyzer would try for an exact
>> match.
>>>   Since you are looking for prefix based search, your best option is to
>>> simply use PrefixQuery and there is no need to put a "*" for prefixquery.
>>>
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>> To: lucene-net-user@lucene.apache.org
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>
>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>> produce the same results.
>>>
>>> To make it interesting, back in my code, I switched over to using the
>>> KeywordAnalyzer, and I'm still not getting any results against that
>>> NOT_ANALYZED field.
>>>
>>> ?
>>>
>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.com> wrote:
>>>
>>>> Luke using keyword analyzer as default makes sense. However, in the
>>>> original post, there was a link to luke output screenshot which
>>>> showed that standard analyzer was in use for query parsing.
>>>>
>>>> -----Original Message-----
>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>> To: lucene-net-user@lucene.apache.org
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>> way.
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>
>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>> scenario, I don't believe you would be able to use Query Parser with
>>>> standard analyzer when data was originally indexed with
>>>> Field.Index.NOT_ANALYZED option.
>>>>> Interesting question is why is luke working/finding the match?  I
>>>>> would
>>>> have expected Luke to not find any matches.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>> To: lucene-net-user@lucene.apache.org
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> I can definitely try that. I just expected QueryParser would
>>>>> respect the
>>>> case of the source string. I was hoping to avoid using the Query API
>>>> per-se, and just let the parser to the work for me.
>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.com> wrote:
>>>>>>>> var query = _parser.Parse("Id:BAUER*");
>>>>>> In your code, most likely, the value got converted to lower case
>> (i.e.
>>>>>> bauer*) by the parse statement.
>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>> (from screen shot).
>>>>>>
>>>>>> Can you explicitly try using prefix query?
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Same results, apparently, when I use Luke 1.0.1.
>>>>>>>
>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but in my
>>>>>>> custom app, zero.
>>>>>>>
>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>> <rvesse@dotnetrdf.org>
>>>>>> wrote:
>>>>>>>> You appear to be using Luke 3.5 which per the information
on
>>>>>>>> the Luke homepage (http://code.google.com/p/luke/) uses Lucene
>>>>>>>> 3.5
>>>>>>>>
>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be surprised
>>>>>>>> to see different behavior between the API and executing in
Luke.
>>>>>>>>
>>>>>>>> If you use a version of Luke which more closely aligns with
the
>>>>>>>> version
>>>>>>> of
>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should be
close
>>>>>>>> enough since the 2.9.x releases were previews of the 3.0.x
>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <rob.cecil@gmail.com>
wrote:
>>>>>>>>
>>>>>>>>> If I run a query against my index using QueryParser to
query a
>>> field:
>>>>>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>                  var topDocs = searcher.Search(query,
10);
>>>>>>>>>                  Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>
>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query
phrase
>>>>>>>>> yields
>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>> both to create the index and to query.
>>>>>>>>>
>>>>>>>>> The field is defined as:
>>>>>>>>>
>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>
>>>>>>>>> and is a string field. The result set back from Luke
looks
>>>>>>>>> like
>>>>>>>>> (screencap):
>>>>>>>>>
>>>>>>>>> http://screencast.com/t/NooMK2Rf
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>
>>>>



Mime
View raw message