lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Cecil <rob.ce...@gmail.com>
Subject Re: SPAM-HIGH: Disparity between API usage and Luke
Date Wed, 27 Jun 2012 16:05:59 GMT
Thanks Simon that works - even with StandardAnalyzer! :)

On Tue, Jun 26, 2012 at 11:44 PM, Simon Svensson <sisve@devhost.se> wrote:

> Set queryParser.**SetLowercaseExpandedTerms(**false);
>
>
> On 2012-06-27 03:55, Rob Cecil wrote:
>
>> Sure, this is self-contained:
>>
>> [Test]
>>         public void QueryNonAnalyzedField()
>>         {
>>             var indexPath = Path.Combine(Environment.**CurrentDirectory,
>> "testindex");
>>             var directory = FSDirectory.Open(new
>> DirectoryInfo(indexPath));
>>             var analyzer = new KeywordAnalyzer();
>>             var writer = new IndexWriter(directory, analyzer, true,
>> IndexWriter.MaxFieldLength.**LIMITED);
>>             var document = new Document();
>>             document.Add(new Field("Id", "BAUERREVENUE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERLOCATION",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERPRODUCT",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERPRODUCTLINE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERSTATE",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "BAUERTOTAL",
>> Field.Store.YES, Field.Index.NOT_ANALYZED));
>>             document.Add(new Field("Id", "NOTBAUER", Field.Store.YES,
>> Field.Index.NOT_ANALYZED));
>>             writer.AddDocument(document);
>>             writer.Optimize();
>>             writer.Close();
>>
>>             IndexReader reader = IndexReader.Open(directory, true);
>>             var queryParser = new QueryParser(Version.LUCENE_29,
>> "content", analyzer);
>>             var query = queryParser.Parse("Id:BAUER*")**;
>>             var indexSearch = new IndexSearcher(reader);
>>             var hits = indexSearch.Search(query);
>>             Assert.AreEqual(6, hits.Length());
>>         }
>>
>>
>> On Tue, Jun 26, 2012 at 6:35 PM, Lingam, ChandraMohan J <
>> chandramohan.j.lingam@intel.**com <chandramohan.j.lingam@intel.com>>
>> wrote:
>>
>>  Just did a simple test and Keywordanalyzer does indeed work like a prefix
>>> query if you put a star at the end. Agree with Simon.  Most likely luke
>>> was
>>> using keyword analyzer and somehow UI was not reflecting it?
>>>
>>> Please post a small snippet of your index code and query code...
>>>
>>> -----Original Message-----
>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>> Sent: Tuesday, June 26, 2012 5:25 PM
>>> To: lucene-net-user@lucene.apache.**org<lucene-net-user@lucene.apache.org>
>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>
>>> Thanks, and there is no equivalent QueryParser syntax for that?
>>>
>>> On Tue, Jun 26, 2012 at 6:21 PM, Lingam, ChandraMohan J <
>>> chandramohan.j.lingam@intel.**com <chandramohan.j.lingam@intel.com>>
>>> wrote:
>>>
>>>  actually, that makes sense. Keyword analyzer would try for an exact
>>>>
>>> match.
>>>
>>>>  Since you are looking for prefix based search, your best option is to
>>>> simply use PrefixQuery and there is no need to put a "*" for
>>>> prefixquery.
>>>>
>>>> -----Original Message-----
>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>> Sent: Tuesday, June 26, 2012 4:57 PM
>>>> To: lucene-net-user@lucene.apache.**org<lucene-net-user@lucene.apache.org>
>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>
>>>> That is correct. I've verified in Luke 1.0.1 that both analyzers
>>>> produce the same results.
>>>>
>>>> To make it interesting, back in my code, I switched over to using the
>>>> KeywordAnalyzer, and I'm still not getting any results against that
>>>> NOT_ANALYZED field.
>>>>
>>>> ?
>>>>
>>>> On Tue, Jun 26, 2012 at 5:52 PM, Lingam, ChandraMohan J <
>>>> chandramohan.j.lingam@intel.**com <chandramohan.j.lingam@intel.com>>
>>>> wrote:
>>>>
>>>>  Luke using keyword analyzer as default makes sense. However, in the
>>>>> original post, there was a link to luke output screenshot which
>>>>> showed that standard analyzer was in use for query parsing.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Simon Svensson [mailto:sisve@devhost.se]
>>>>> Sent: Tuesday, June 26, 2012 2:56 PM
>>>>> To: lucene-net-user@lucene.apache.**org<lucene-net-user@lucene.apache.org>
>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>
>>>>> Luke defaults to KeywordAnalyzer which wont change your term in any
>>>>>
>>>> way.
>>>
>>>> The QueryParser will still break up your query, so "Name:Jack Bauer"
>>>>> would become (Name:Jack DefaultField:Bauer). I believe you can have
>>>>> per-field analyzers (KeywordAnalyzer for Id, StandardAnalyzer for
>>>>> everything else) using a PerFieldAnalyzerWrapper.
>>>>>
>>>>> On 2012-06-26 23:06, Lingam, ChandraMohan J wrote:
>>>>>
>>>>>> QueryParser has no knowledge of how data was indexed.  For your
>>>>>>
>>>>> scenario, I don't believe you would be able to use Query Parser with
>>>>> standard analyzer when data was originally indexed with
>>>>> Field.Index.NOT_ANALYZED option.
>>>>>
>>>>>> Interesting question is why is luke working/finding the match?  I
>>>>>> would
>>>>>>
>>>>> have expected Luke to not find any matches.
>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Rob Cecil [mailto:rob.cecil@gmail.com]
>>>>>> Sent: Tuesday, June 26, 2012 12:54 PM
>>>>>> To: lucene-net-user@lucene.apache.**org<lucene-net-user@lucene.apache.org>
>>>>>> Subject: Re: SPAM-HIGH: Disparity between API usage and Luke
>>>>>>
>>>>>> I can definitely try that. I just expected QueryParser would
>>>>>> respect the
>>>>>>
>>>>> case of the source string. I was hoping to avoid using the Query API
>>>>> per-se, and just let the parser to the work for me.
>>>>>
>>>>>> On Tue, Jun 26, 2012 at 1:19 PM, Lingam, ChandraMohan J <
>>>>>>
>>>>> chandramohan.j.lingam@intel.**com <chandramohan.j.lingam@intel.com>>
>>>>> wrote:
>>>>>
>>>>>>  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>
>>>>>>>> In your code, most likely, the value got converted to lower
case
>>>>>>>
>>>>>> (i.e.
>>>
>>>>  bauer*) by the parse statement.
>>>>>>> Whereas indexed value is in upper case as it is not analyzed
>>>>>>> (from screen shot).
>>>>>>>
>>>>>>> Can you explicitly try using prefix query?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>  Same results, apparently, when I use Luke 1.0.1.
>>>>>>>>
>>>>>>>> When I search for "Id:BAUER*" I get 15 hits in Luke, but
in my
>>>>>>>> custom app, zero.
>>>>>>>>
>>>>>>>> On Tue, Jun 26, 2012 at 12:31 PM, Rob Vesse
>>>>>>>> <rvesse@dotnetrdf.org>
>>>>>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> You appear to be using Luke 3.5 which per the information
on
>>>>>>>>> the Luke homepage (http://code.google.com/p/**luke/<http://code.google.com/p/luke/>)
>>>>>>>>> uses Lucene
>>>>>>>>> 3.5
>>>>>>>>>
>>>>>>>>> Since Lucene.Net is currently on 2.9.4 I wouldn't be
surprised
>>>>>>>>> to see different behavior between the API and executing
in Luke.
>>>>>>>>>
>>>>>>>>> If you use a version of Luke which more closely aligns
with the
>>>>>>>>> version
>>>>>>>>>
>>>>>>>> of
>>>>>>>>
>>>>>>>>> Lucene.Net (Luke 1.0.1 uses Lucene 3.0.1 which should
be close
>>>>>>>>> enough since the 2.9.x releases were previews of the
3.0.x
>>>>>>>>> releases as I understood it) what behavior do you see?
>>>>>>>>>
>>>>>>>>> Hope this helps,
>>>>>>>>>
>>>>>>>>> Rob
>>>>>>>>>
>>>>>>>>> On 6/26/12 10:50 AM, "Rob Cecil" <rob.cecil@gmail.com>
wrote:
>>>>>>>>>
>>>>>>>>>  If I run a query against my index using QueryParser
to query a
>>>>>>>>>>
>>>>>>>>> field:
>>>>
>>>>>                  var query = _parser.Parse("Id:BAUER*");
>>>>>>>>>>                 var topDocs = searcher.Search(query,
10);
>>>>>>>>>>                 Assert.AreEqual(count, topDocs.TotalHits);
>>>>>>>>>>
>>>>>>>>>> I get 0 for my TotalHits, yet in Luke, the same query
phrase
>>>>>>>>>> yields
>>>>>>>>>> 15 results, what am I doing wrong? I use the StandardAnalyzer
>>>>>>>>>> both to create the index and to query.
>>>>>>>>>>
>>>>>>>>>> The field is defined as:
>>>>>>>>>>
>>>>>>>>>> new Field("Id", myObject.Id, Field.Store.YES,
>>>>>>>>>> Field.Index.NOT_ANALYZED)
>>>>>>>>>>
>>>>>>>>>> and is a string field. The result set back from Luke
looks
>>>>>>>>>> like
>>>>>>>>>> (screencap):
>>>>>>>>>>
>>>>>>>>>> http://screencast.com/t/**NooMK2Rf<http://screencast.com/t/NooMK2Rf>
>>>>>>>>>>
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message