lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elmer van Chastelet <evanchaste...@gmail.com>
Subject Re: PhoneticFilterFactory 's inject parameter
Date Wed, 25 Apr 2012 12:22:01 GMT
I keep replying to myself, it all gets a bit confusing.
The problem still exists and I don't understand why, and why it worked once.

I have the same behavior again as posted in my first mail:
- Inject parameter is set to true.
- The index has _no deleted documents_ and is optimized.
- The term 'compete' is in there.
- If I ask Luke to show all docs for term 'compete' it shows me the one 
and only document that represents this word. But...
- If I perform the query 'value:compete' in luke again, it says there 
are no results.

Here is the index I'm currently using. It contains various fields for 
the available phonetic filter encoders:
https://www.box.com/s/34212e82227e102f6734

Can somebody explain this behavior? What's the real use of the inject 
parameter of the PhoneticFilterFactory?

Thanks in advance.

-Elmer


On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
> Problem solved. Long story short: for some reason I had deleted 
> documents in the index and the non-deleted documents used the phonetic 
> filter with inject set to false.
>
> Works fine now :)
>
> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>> Hi all,
>>
>> (scroll to bottom for question)
>>
>> I was setting up a simple web app to play around with phonetic filters.
>> The idea is simple, I just create a document for each word in the 
>> English dictionary, each document containing a single search field 
>> holding the value after it is preprocessed using the following 
>> analyzer def (in our own dsl syntax, which gets transformed to java):
>>
>> analyzer soundslike{
>>     tokenizer = KeywordTokenizer
>>     tokenfilter = LowerCaseFilter
>>     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", 
>> inject="true")
>> }
>>
>> I can run the web app and I get results that indeed (in some way) 
>> sound like the original query term.
>>
>> But what confuses me is the ranking of the results, knowing that I 
>> set the inject param to true. If I search for the query term 
>> 'compete', the parsed query becomes '(value:KMPT value:compete)', and 
>> therefore I expect the word 'compete' to be ranked highest in the 
>> list than any other word.... but this wasn't the case.
>>
>> Looking further at the explanation of results, I saw that the term 
>> 'compete' in the parsed query is totally absent, and only the 
>> phonetic encoding seems affect the ranking:
>>
>>   * COMPETITOR
>>       o 4.368826 = (MATCH) sum of:
>>           + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>               # 0.52838135 = queryWeight(value:KMPT), product of:
>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>                   * 0.063904315 = queryNorm
>>               # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>                 product of:
>>                   * 1.0 = tf(termFreq(value:KMPT)=1)
>>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>                   * 1.0 = fieldNorm(field=value, doc=3174)
>>
>> The next thing I did was running our friend Luke. In Luke, I opened 
>> the documents tab, and started iterating over some terms for the 
>> field 'value' until I found 'compete'. When I hit 'Show All Docs', 
>> the search tab opens and it displays the one and only document 
>> holding this value (i.e. the document representing the word 
>> 'compete'). It shows the query: 'value:compete '. Then, when I hit 
>> the search button again (query is still 'value:compete '), it says 
>> that there are no results !?
>>
>> Probably, the 'Show All Docs' button does something different than 
>> performing a query using the search tab in Luke.
>>
>> Q: Can somebody explain why the injected original terms seem to get 
>> ignored at query time? Or may it be related to the name of the search 
>> field ('value'), or something else?
>>
>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>
>> -Elmer
>>
>>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message