lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: PhoneticFilterFactory 's inject parameter
Date Thu, 26 Apr 2012 12:51:39 GMT
There are useful tips in the FAQ,
http://wiki.apache.org/lucene-java/LuceneFAQ#Why_am_I_getting_no_hits_.2BAC8_incorrect_hits.3F.

I still think you should come up with small self-contained example code.


--
Ian.


On Wed, Apr 25, 2012 at 4:02 PM, Elmer van Chastelet
<evanchastelet@gmail.com> wrote:
> Thanks for your suggestion Ian, but I just found out that if I replace the
> KeywordTokenizer with a WhitespaceTokenizer, all seems to work fine.
>
> Just to test what happens, I created another field 'orig', using this
> analyzer:
> analyzer KeywordLowered{
>    tokenizer = KeywordTokenizer
>    tokenfilter = LowerCaseFilter
> }
>
> Guess what.. exactly the same problem, also in Luke.
> It finds no documents with for query:
> orig:strange
> While the term 'strange' is in the index for the field 'orig'.
>
> Does anybody have a clue why documents are not matched when using the
> KeywordTokenizer? Remember that all queries and terms don't contain white
> spaces.
>
>
> Thanks again.
> -Elmer
>
>
> On 04/25/2012 02:53 PM, Ian Lea wrote:
>>
>> You seem to be quietly going round in circles, by yourself!  I suggest
>> a small self-contained program/test case with a RAM index created from
>> scratch.  You can then experiment with inject on or off and if you
>> still can't figure it out, post the code and hopefully someone will be
>> able to help you make sense of it.
>>
>> Make sure you tell us what version of Lucene you are using.  If not
>> the latest, wouldn't hurt to try with the latest.
>>
>>
>> --
>> Ian.
>>
>>
>> On Wed, Apr 25, 2012 at 1:22 PM, Elmer van Chastelet
>> <evanchastelet@gmail.com>  wrote:
>>>
>>> I keep replying to myself, it all gets a bit confusing.
>>> The problem still exists and I don't understand why, and why it worked
>>> once.
>>>
>>> I have the same behavior again as posted in my first mail:
>>> - Inject parameter is set to true.
>>> - The index has _no deleted documents_ and is optimized.
>>> - The term 'compete' is in there.
>>> - If I ask Luke to show all docs for term 'compete' it shows me the one
>>> and
>>> only document that represents this word. But...
>>> - If I perform the query 'value:compete' in luke again, it says there are
>>> no
>>> results.
>>>
>>> Here is the index I'm currently using. It contains various fields for the
>>> available phonetic filter encoders:
>>> https://www.box.com/s/34212e82227e102f6734
>>>
>>> Can somebody explain this behavior? What's the real use of the inject
>>> parameter of the PhoneticFilterFactory?
>>>
>>> Thanks in advance.
>>>
>>> -Elmer
>>>
>>>
>>> On 04/25/2012 12:25 PM, Elmer van Chastelet wrote:
>>>>
>>>> Problem solved. Long story short: for some reason I had deleted
>>>> documents
>>>> in the index and the non-deleted documents used the phonetic filter with
>>>> inject set to false.
>>>>
>>>> Works fine now :)
>>>>
>>>> On 04/23/2012 09:27 PM, Elmer van Chastelet wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> (scroll to bottom for question)
>>>>>
>>>>> I was setting up a simple web app to play around with phonetic filters.
>>>>> The idea is simple, I just create a document for each word in the
>>>>> English
>>>>> dictionary, each document containing a single search field holding the
>>>>> value
>>>>> after it is preprocessed using the following analyzer def (in our own
>>>>> dsl
>>>>> syntax, which gets transformed to java):
>>>>>
>>>>> analyzer soundslike{
>>>>>    tokenizer = KeywordTokenizer
>>>>>    tokenfilter = LowerCaseFilter
>>>>>    tokenfilter = PhoneticFilter(encoder="DoubleMetaphone",
>>>>> inject="true")
>>>>> }
>>>>>
>>>>> I can run the web app and I get results that indeed (in some way) sound
>>>>> like the original query term.
>>>>>
>>>>> But what confuses me is the ranking of the results, knowing that I set
>>>>> the inject param to true. If I search for the query term 'compete', the
>>>>> parsed query becomes '(value:KMPT value:compete)', and therefore I
>>>>> expect
>>>>> the word 'compete' to be ranked highest in the list than any other
>>>>> word....
>>>>> but this wasn't the case.
>>>>>
>>>>> Looking further at the explanation of results, I saw that the term
>>>>> 'compete' in the parsed query is totally absent, and only the phonetic
>>>>> encoding seems affect the ranking:
>>>>>
>>>>>  * COMPETITOR
>>>>>      o 4.368826 = (MATCH) sum of:
>>>>>          + 4.368826 = (MATCH) weight(value:KMPT in 3174), product
of:
>>>>>              # 0.52838135 = queryWeight(value:KMPT), product of:
>>>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>>                  * 0.063904315 = queryNorm
>>>>>              # 8.26832 = (MATCH) fieldWeight(value:KMPT in 3174),
>>>>>                product of:
>>>>>                  * 1.0 = tf(termFreq(value:KMPT)=1)
>>>>>                  * 8.26832 = idf(docFreq=150, maxDocs=216555)
>>>>>                  * 1.0 = fieldNorm(field=value, doc=3174)
>>>>>
>>>>> The next thing I did was running our friend Luke. In Luke, I opened the
>>>>> documents tab, and started iterating over some terms for the field
>>>>> 'value'
>>>>> until I found 'compete'. When I hit 'Show All Docs', the search tab
>>>>> opens
>>>>> and it displays the one and only document holding this value (i.e. the
>>>>> document representing the word 'compete'). It shows the query:
>>>>> 'value:compete '. Then, when I hit the search button again (query is
>>>>> still
>>>>> 'value:compete '), it says that there are no results !?
>>>>>
>>>>> Probably, the 'Show All Docs' button does something different than
>>>>> performing a query using the search tab in Luke.
>>>>>
>>>>> Q: Can somebody explain why the injected original terms seem to get
>>>>> ignored at query time? Or may it be related to the name of the search
>>>>> field
>>>>> ('value'), or something else?
>>>>>
>>>>> We use Lucene 3.1 with SOLR analyzers (by Hibernate Search 3.4.2).
>>>>>
>>>>> -Elmer
>>>>>
>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message