lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igal @ getRailo.org" <i...@getrailo.org>
Subject Re: using CharFilter to inject a space
Date Sat, 03 Nov 2012 23:47:22 GMT
I considered it, and it's definitely an option.

but I read in the book "Lucene In Action" that MappingCharFilter is 
inefficient and I'm not sure that I need that.  if implementing my own 
involves a lot of coding then I might resort to it as I don't have large 
data sets to index at this time.

thanks for your answer,


Igal


On 11/3/2012 4:42 PM, Robert Muir wrote:
> On Sat, Nov 3, 2012 at 7:35 PM, Igal @ getRailo.org <igal@getrailo.org> wrote:
>> hi,
>>
>> I want to make sure that every comma (,) and semi-colon (;) is followed by a
>> space prior to tokenizing.
>>
>> the idea is to then use a WhitespaceTokenizer which will keep commas but
>> still split the phrase in a case like:
>>
>>      "I bought red apples,green pears,and yellow oranges"
>>
>> I'm thinking of extending CharFilter to "inject" a space after the comma.
>> my questions are:
>>
>>      1) does it make sense or am I completely off here?
>>
>>      2) are there any code examples of CharFilter implementations with
>> injection of a char?
> Can't you just use something like MappingCharFilter with a single
> mapping of "," to ", " ?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message