lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Modassar Ather <modather1...@gmail.com>
Subject Re: Position increment in WordDelimiterFilter.
Date Mon, 18 Jan 2016 13:21:53 GMT
Can you please send us tokens you get (and positions) when you analyze
*WiFi device*

Tokens generated and their respective positions.

WiFi                1
Wi                   1
WiFi                1
Fi                    2
device             3

Best,
Modassar

On Fri, Jan 15, 2016 at 6:25 PM, Emir Arnautovic <
emir.arnautovic@sematext.com> wrote:

> Can you please send us tokens you get (and positions) when you analyze
> *WiFi device*
>
> On 15.01.2016 13:15, Modassar Ather wrote:
>
>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other?
>> I am using WhiteSpaceTokenizer in my analysis chain so wi fi becomes two
>> different token. Please refer to my examples given in previous mail about
>> the issues faced.
>> Wi Fi are two term which will match but what happens if for a content
>> having *WiFi device* is searched with *"WiFi device"*. It will not match
>> as
>> there is a position increment by WordDelimiterFilter for WiFi.
>> "WiFi device"~1 will match which is confusing that there is no gap in the
>> content why a slop is required.
>>
>> Why do you use WordDelimiterFilter? Can you give us few examples where it
>> is useful?
>> It is useful when a word like* lucene-search documentation *is indexed
>> with
>>
>> WordDelimiterFilter and it is broken in two terms like lucene and search
>> then it will be helpful to get the documents containing it for queries
>> like
>> lucene documentation or search documentation.
>>
>> Best,
>> Modassar
>>
>> On Fri, Jan 15, 2016 at 2:14 PM, Emir Arnautovic <
>> emir.arnautovic@sematext.com> wrote:
>>
>> Modassar,
>>> Are you saying that WiFi Wi-Fi and Wi Fi should not match each other? Why
>>> do you use WordDelimiterFilter? Can you give us few examples where it is
>>> useful?
>>>
>>> Thanks,
>>> Emir
>>>
>>>
>>> On 15.01.2016 05:13, Modassar Ather wrote:
>>>
>>> Thanks for your responses.
>>>>
>>>> It seems to me that you don't want to split on numbers.
>>>> It is not with number only. Even if you try to analyze WiFi it will
>>>> create
>>>> 4 token one of which will be at position 2. So basically the issue is
>>>> with
>>>> position increment which causes few of the queries behave unexpectedly.
>>>>
>>>> Which release of Solr are you using?
>>>> I am using Lucene/Solr-5.4.0.
>>>>
>>>> Best,
>>>> Modassar
>>>>
>>>> On Thu, Jan 14, 2016 at 9:44 PM, Jack Krupansky <
>>>> jack.krupansky@gmail.com
>>>> wrote:
>>>>
>>>> Which release of Solr are you using? Last year (or so) there was a
>>>> Lucene
>>>>
>>>>> change that had the effect of keeping all terms for WDF at the same
>>>>> position. There was also some discussion about whether this was either
>>>>> a
>>>>> bug or a bug fix, but I don't recall any resolution.
>>>>>
>>>>> -- Jack Krupansky
>>>>>
>>>>> On Thu, Jan 14, 2016 at 4:15 AM, Modassar Ather <
>>>>> modather1981@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>> I have following definition for WordDelimiterFilter.
>>>>>>
>>>>>> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
>>>>>> generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>>>>>> catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/>
>>>>>>
>>>>>> The analysis of 3d shows following four tokens and their positions.
>>>>>>
>>>>>> token         position
>>>>>> 3d             1
>>>>>> 3               1
>>>>>> 3d             1
>>>>>> d               2
>>>>>>
>>>>>> Please help me understand why d is at 2? Should not it also be at
>>>>>>
>>>>>> position
>>>>>
>>>>> 1.
>>>>>> Is it a bug and if not is there any attribute which I can use to
>>>>>> restrict
>>>>>> the position increment?
>>>>>>
>>>>>> Thanks,
>>>>>> Modassar
>>>>>>
>>>>>>
>>>>>> --
>>> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
>>> Solr & Elasticsearch Support * http://sematext.com/
>>>
>>>
>>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message