lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahmoud Almokadem <prog.mahm...@gmail.com>
Subject Re: Arabic analyser
Date Mon, 09 Nov 2015 16:47:27 GMT
Thanks Jack, 

This is a good solution, but we have more combinations that I think can’t be handled as
synonyms like every word starts with ‘عبد’ ‘Abd’ and ‘أبو’ ‘Abo’. When
using Standard tokenizer on ‘أبو بكر’ ‘Abo Bakr’, It’ll be tokenised to ‘أبو’
and ‘بكر’ and the filters will be applied for each separate term.

Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a single term?

Thanks,
Mahmoud 


> On Nov 9, 2015, at 5:47 PM, Jack Krupansky <jack.krupansky@gmail.com> wrote:
> 
> Use an index-time (but not query time) synonym filter with a rule like:
> 
> Abd Allah,Abdallah
> 
> This will index the combined word in addition to the separate words.
> 
> -- Jack Krupansky
> 
> On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem <prog.mahmoud@gmail.com>
> wrote:
> 
>> Hello,
>> 
>> We are indexing Arabic content and facing a problem for tokenizing multi
>> terms phrases like 'عبد الله' 'Abd Allah', so users will search for
>> 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
>> الله' with space. We are using StandardTokenizer.
>> 
>> 
>> Is there any configurations to handle this case?
>> 
>> Thank you,
>> Mahmoud
>> 


Mime
View raw message