lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elmer van Chastelet <evanchaste...@gmail.com>
Subject Re: PhoneticFilterFactory 's inject parameter
Date Wed, 25 Apr 2012 10:25:06 GMT
Problem solved. Long story short: for some reason I had deleted 
documents in the index and the non-deleted documents used the phonetic 
filter with inject set to false.

Works fine now :)

On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
> Hi all,
>
> (scroll to bottom for question)
>
> I was setting up a simple web app to play around with phonetic filters.
> The idea is simple, I just create a document for each word in the 
> English dictionary, each document containing a single search field 
> holding the value after it is preprocessed using the following 
> analyzer def (in our own dsl syntax, which gets transformed to java):
>
> analyzer soundslike{
>     tokenizer = KeywordTokenizer
>     tokenfilter = LowerCaseFilter
>     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
> }
>
> I can run the web app and I get results that indeed (in some way) 
> sound like the original query term.
>
> But what confuses me is the ranking of the results, knowing that I set 
> the inject param to true. If I search for the query term 'compete', 
> the parsed query becomes '(value:KMPT value:compete)', and therefore I 
> expect the word 'compete' to be ranked highest in the list than any 
> other word.... but this wasn't the case.
>
> Looking further at the explanation of results, I saw that the term 
> 'compete' in the parsed query is totally absent, and only the phonetic 
> encoding seems affect the ranking:
>
>   * COMPETITOR
>       o 4.368826 = (MATCH) sum of:
>           + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>               # 0.52838135 = queryWeight(value:KMPT), product of:
>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>                   * 0.063904315 = queryNorm
>               # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>                 product of:
>                   * 1.0 = tf(termFreq(value:KMPT)=1)
>                   * 8.26832 = idf(docFreq=150, maxDocs=216555)
>                   * 1.0 = fieldNorm(field=value, doc=3174)
>
> The next thing I did was running our friend Luke. In Luke, I opened 
> the documents tab, and started iterating over some terms for the field 
> 'value' until I found 'compete'. When I hit 'Show All Docs', the 
> search tab opens and it displays the one and only document holding 
> this value (i.e. the document representing the word 'compete'). It 
> shows the query: 'value:compete '. Then, when I hit the search button 
> again (query is still 'value:compete '), it says that there are no 
> results !?
>
> Probably, the 'Show All Docs' button does something different than 
> performing a query using the search tab in Luke.
>
> Q: Can somebody explain why the injected original terms seem to get 
> ignored at query time? Or may it be related to the name of the search 
> field ('value'), or something else?
>
> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>
> -Elmer
>
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message