lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch
Date Sat, 29 Jan 2011 11:09:21 GMT
On 29/01/2011 01:45, Koji Sekiguchi wrote:
> (11/01/25 2:14), Paul Taylor wrote:
>> On 22/01/2011 15:43, Koji Sekiguchi wrote:
>>> (11/01/20 22:19), Paul Taylor wrote:
>>>> Trying to extend MappingCharFilter so that it only changes a token 
>>>> if the length of the token
>>>> matches the length of singleMatch in NormalizeCharMap (currently 
>>>> the singleMatch just has to be
>>>> found in the token I want ut to match the whole token). Can this be 
>>>> done it sounds simple enough but
>>>> I cannot make any headway understanding the MappingCharFilter 
>>>> source code
>>>>
>>>> thanks Paul
>>>
>>> Paul,
>>>
>>> Can you give us a concrete input/output (you wanted) with mapping table
>>> so that I can understand what you want?
>>>
>>> Thanks,
>>>
>>> Koji
>> Sure
>>
>> charConvertMap.add("!!!","ApostropheApostropheApostrophe");
>> charConvertMap.add("*** ***","StarStarStar");
>> charConvertMap.add("!","Apostrophe");
>>
>> Normally, punctuation gets removed during index and searching which 
>> is what I want for good search
>> results but when the token only contains specific punctuation strings 
>> I don't want to remove the
>> punctuation because it would make it impossible to match, so I 
>> convert it to a textual representation.
>>
>> As it stands in the 3rd case '!' will be preserved wherever it is 
>> found, so to get a good match on
>> 'Wow!' you would have to search for 'Wow!. But I want you to be able 
>> to search for 'Wow' and it
>> return 'Wow!' which is the case if "!" isn't in the char convert map, 
>> but if you searched for '!' I
>> want it to return the token which is just '!' which is only the case 
>> if the value is added to the map.
>>
>> I need to do this because the text we are indexing and searching are 
>> short strings representing an
>> music artist name (there is an artist called !!!)
>>
>> thanks Paul
>>
>>
> Hi Paul,
>
> Still I'm not sure I understand your issue correctly, but if you want:
>
> query="Wow!" result="Wow!"
> query="Wow"  result="Wow!"
> query="!"    result="Wow!"
> query="!!!"  result="!!!"
>
> does the following maps solve your problem?
> (I assume you use Whitespace-type-Tokenizer here)
>
> charConvertMap.add("!!!","ApostropheApostropheApostrophe");
> charConvertMap.add("!"," Apostrophe");  // there is a space in front 
> of "!"
>
> Koji
No, the list of names  your solution would convert all cases of 
apostrophe which is not what I want to, and I need to do this for is 
much larger than the two examples I give here,so you cannot rely on the 
order they are added
in.

Is it possible to help me with the original question, how do I subclass 
MaapingCharFilter so that it only changes complete matching tokens.

i.e if my charconvertmap contained

charConvertMap.add("!!!","ApostropheApostropheApostrophe");

it would convert a token of !!! to 'ApostropheApostropheApostrophe' but 
a token of 'Hello!!!' becomes Hello

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message