lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: PhoneticFilterFactory 's inject parameter
Date Wed, 25 Apr 2012 12:53:27 GMT
You seem to be quietly going round in circles, by yourself!  I suggest
a small self-contained program/test case with a RAM index created from
scratch.  You can then experiment with inject on or off and if you
still can't figure it out, post the code and hopefully someone will be
able to help you make sense of it.

Make sure you tell us what version of Lucene you are using.  If not
the latest, wouldn't hurt to try with the latest.


--
Ian.


On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
<evanchastelet@gmail.com> wrote:
> I keep replying to myself, it all gets a bit confusing.
> The problem still exists and I don't understand why, and why it worked once.
>
> I have the same behavior again as posted in my first mail:
> - Inject parameter is set to true.
> - The index has _no deleted documents_ and is optimized.
> - The term 'compete' is in there.
> - If I ask Luke to show all docs for term 'compete' it shows me the one and
> only document that represents this word. But...
> - If I perform the query 'value:compete' in luke again, it says there are no
> results.
>
> Here is the index I'm currently using. It contains various fields for the
> available phonetic filter encoders:
> https://www.box.com/s/34212e82227e102f6734
>
> Can somebody explain this behavior? What's the real use of the inject
> parameter of the PhoneticFilterFactory?
>
> Thanks in advance.
>
> -Elmer
>
>
> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>
>> Problem solved. Long story short: for some reason I had deleted documents
>> in the index and the non-deleted documents used the phonetic filter with
>> inject set to false.
>>
>> Works fine now :)
>>
>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>
>>> Hi all,
>>>
>>> (scroll to bottom for question)
>>>
>>> I was setting up a simple web app to play around with phonetic filters.
>>> The idea is simple, I just create a document for each word in the English
>>> dictionary, each document containing a single search field holding the value
>>> after it is preprocessed using the following analyzer def (in our own dsl
>>> syntax, which gets transformed to java):
>>>
>>> analyzer soundslike{
>>>    tokenizer = KeywordTokenizer
>>>    tokenfilter = LowerCaseFilter
>>>    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone", inject="true")
>>> }
>>>
>>> I can run the web app and I get results that indeed (in some way) sound
>>> like the original query term.
>>>
>>> But what confuses me is the ranking of the results, knowing that I set
>>> the inject param to true. If I search for the query term 'compete', the
>>> parsed query becomes '(value:KMPT value:compete)', and therefore I expect
>>> the word 'compete' to be ranked highest in the list than any other word....
>>> but this wasn't the case.
>>>
>>> Looking further at the explanation of results, I saw that the term
>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>> encoding seems affect the ranking:
>>>
>>>  * COMPETITOR
>>>      o 4.368826 = (MATCH) sum of:
>>>          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product of:
>>>              # 0.52838135 = queryWeight(value:KMPT), product of:
>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>                  * 0.063904315 = queryNorm
>>>              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>                product of:
>>>                  * 1.0 = tf(termFreq(value:KMPT)=1)
>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>                  * 1.0 = fieldNorm(field=value, doc=3174)
>>>
>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>> documents tab, and started iterating over some terms for the field 'value'
>>> until I found 'compete'. When I hit 'Show All Docs', the search tab opens
>>> and it displays the one and only document holding this value (i.e. the
>>> document representing the word 'compete'). It shows the query:
>>> 'value:compete '. Then, when I hit the search button again (query is still
>>> 'value:compete '), it says that there are no results !?
>>>
>>> Probably, the 'Show All Docs' button does something different than
>>> performing a query using the search tab in Luke.
>>>
>>> Q: Can somebody explain why the injected original terms seem to get
>>> ignored at query time? Or may it be related to the name of the search field
>>> ('value'), or something else?
>>>
>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>
>>> -Elmer
>>>
>>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message