lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koji Sekiguchi <k...@r.email.ne.jp>
Subject Re: Trying to extend MappingCharFilter so that it only changes a token if the length of the token matches the length of singleMatch
Date Sat, 29 Jan 2011 01:45:14 GMT
(11/01/25 2:14), Paul Taylor wrote:
> On 22/01/2011 15:43, Koji Sekiguchi wrote:
>> (11/01/20 22:19), Paul Taylor wrote:
>>> Trying to extend MappingCharFilter so that it only changes a token if the length
of the token
>>> matches the length of singleMatch in NormalizeCharMap (currently the singleMatch
just has to be
>>> found in the token I want ut to match the whole token). Can this be done it sounds
simple enough but
>>> I cannot make any headway understanding the MappingCharFilter source code
>>>
>>> thanks Paul
>>
>> Paul,
>>
>> Can you give us a concrete input/output (you wanted) with mapping table
>> so that I can understand what you want?
>>
>> Thanks,
>>
>> Koji
> Sure
>
> charConvertMap.add("!!!","ApostropheApostropheApostrophe");
> charConvertMap.add("*** ***","StarStarStar");
> charConvertMap.add("!","Apostrophe");
>
> Normally, punctuation gets removed during index and searching which is what I want for
good search
> results but when the token only contains specific punctuation strings I don't want to
remove the
> punctuation because it would make it impossible to match, so I convert it to a textual
representation.
>
> As it stands in the 3rd case '!' will be preserved wherever it is found, so to get a
good match on
> 'Wow!' you would have to search for 'Wow!. But I want you to be able to search for 'Wow'
and it
> return 'Wow!' which is the case if "!" isn't in the char convert map, but if you searched
for '!' I
> want it to return the token which is just '!' which is only the case if the value is
added to the map.
>
> I need to do this because the text we are indexing and searching are short strings representing
an
> music artist name (there is an artist called !!!)
>
> thanks Paul
>
>
Hi Paul,

Still I'm not sure I understand your issue correctly, but if you want:

query="Wow!" result="Wow!"
query="Wow"  result="Wow!"
query="!"    result="Wow!"
query="!!!"  result="!!!"

does the following maps solve your problem?
(I assume you use Whitespace-type-Tokenizer here)

charConvertMap.add("!!!","ApostropheApostropheApostrophe");
charConvertMap.add("!"," Apostrophe");  // there is a space in front of "!"

Koji
-- 
http://www.rondhuit.com/en/

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message