lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Elmer van Chastelet <evanchaste...@gmail.com>
Subject PhoneticFilterFactory 's inject parameter
Date Mon, 23 Apr 2012 19:27:39 GMT
Hi all,

(scroll to bottom for question)

I was setting up a simple web app to play around with phonetic filters.
The idea is simple, I just create a document for each word in the 
English dictionary, each document containing a single search field 
holding the value after it is preprocessed using the following analyzer 
def (in our own dsl syntax, which gets transformed to java):

analyzer soundslike{
     tokenizer = KeywordTokenizer
     tokenfilter = LowerCaseFilter
     tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
}

I can run the web app and I get results that indeed (in some way) sound 
like the original query term.

But what confuses me is the ranking of the results, knowing that I set 
the inject param to true. If I search for the query term 'compete', the 
parsed query becomes '(value:KMPT value:compete)', and therefore I 
expect the word 'compete' to be ranked highest in the list than any 
other word.... but this wasn't the case.

Looking further at the explanation of results, I saw that the term 
'compete' in the parsed query is totally absent, and only the phonetic 
encoding seems affect the ranking:

  * COMPETITOR
      o 4.368826 = (MATCH) sum of:
          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
              # 0.52838135 = queryWeight(value:KMPT), product of:
                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
                  * 0.063904315 = queryNorm
              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
                product of:
                  * 1.0 = tf(termFreq(value:KMPT)=1)
                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
                  * 1.0 = fieldNorm(field=value, doc=3174)

The next thing I did was running our friend Luke. In Luke, I opened the 
documents tab, and started iterating over some terms for the field 
'value' until I found 'compete'. When I hit 'Show All Docs', the search 
tab opens and it displays the one and only document holding this value 
(i.e. the document representing the word 'compete'). It shows the query: 
'value:compete '. Then, when I hit the search button again (query is 
still 'value:compete '), it says that there are no results !?

Probably, the 'Show All Docs' button does something different than 
performing a query using the search tab in Luke.

Q: Can somebody explain why the injected original terms seem to get 
ignored at query time? Or may it be related to the name of the search 
field ('value'), or something else?

We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).

-Elmer



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message