lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: Help in Arabic Analysers with Lucene on Windows
Date Mon, 29 Dec 2008 14:16:00 GMT
What does the FIELD_BODY look like?  You search is apparently going  
against that Field, but you don't show how it is indexed.

Have you looked at your index in Luke yet?  http://www.getopt.org/luke?



On Dec 29, 2008, at 8:19 AM, Girish Naik wrote:

> Sorry for that,
>
> Here is how the Analyzer is Selected:
>  public static Analyzer getAnalyzerInstance(String localeKey) {
>     Analyzer analyzer = null;
>     if (localeKey == null || localeKey.trim().equals("")) {
>         localeKey = AppContext.getSetting("defaultLocale");
>         System.out.println("<><><>><><><><><Locale
key taken as  
> Default ");
>     } else {
>         // localeKey may be a csv of locales, in which case picj the  
> first
>         // one.
>         localeKey = StringUtils.split(localeKey, ",")[0].trim();
>         System.out.println("<><><>><><><><><Locale
key is trimmed");
>     }
>     System.out.println("<><><>><><><><><Locale
is " + localeKey);
>     String name = (String) _analyzerMap.get(localeKey);
>     System.out.println("<><><>><><><><><Name
from Locale is " + name);
>     if (name == null) {
>         analyzer = new StandardAnalyzer();
>     } else {
>         // if (name.equalsIgnoreCase("Arabic")) {
>         // analyzer = new ArabicAnalyzer();
>         // } else {
>         analyzer = new SnowballAnalyzer(name);
>         // }
>     }
>     return analyzer;
>     }
>
>
> While Indexing some are analyzed and some are not...
>  document.add(new Field(FIELD_DOCUMENT_CREATED_ON, LocaleUtils
>             .convert8859_6ToUTF8 
> (com.aurigalogic.activesite.field.Field
>                 .indexableDate(avsDoc.getCreatedOn())),
>             Field.Store.YES, Field.Index.NOT_ANALYZED));
> ...
> document.add(new Field(FIELD_CONTENT_TYPE, LocaleUtils
>             .convert8859_6ToUTF8(version.getDocument()
>                 .getContentDescriptor().getName()),
>             Field.Store.YES, Field.Index.ANALYZED));
> Currently the method LocaleUtils.convert8859_6ToUTF8 does nothing  
> but returns the parameter as is.
>
> While seraching the Query parser  etc.  are created like
> Analyzer analyzer = AnalyzerSelector.getAnalyzerInstance(locale);
> ...
> QueryParser qparser = new QueryParser(Constants.FIELD_BODY, analyzer);
> ...
>
>
> So while posting the form with a Arabic word does not fetch the  
> results. An English word does work though!!
>
> I would be more that helpful if anything else is required.
>
>
>
>
> Regards,
>
> Please do not print this email unless it is absolutely necessary.
> Girish Naik
> Development Lead
>
> Neev Information Technologies Pvt Ltd
> Bangalore, Karnataka India
>
> <banner-5b.png>	Mobile: 91 09740091638
> Email: girish.naik@gmail.com
> IM: girish.naik (Skype)
> http://www.linkedin.com/in/girishnaik
>
> <banner-2c.png>	<neev_logo.gif>
>
> Fools rush in where angels fear to tread
> See who we know in common	Want a signature like this?
> The information contained in this electronic message and any  
> attachments to this message are intended for the exclusive use of  
> the addressee(s) and may contain proprietary, confidential or  
> privileged information. If you are not the intended recipient, you  
> should not disseminate, distribute or copy this e-mail. Please  
> notify the sender immediately and destroy all copies of this message  
> and any attachments.
> WARNING: Computer viruses can be transmitted via email. The  
> recipient should check this email and any attachments for the  
> presence of viruses. The company accepts no liability for any damage  
> caused by any virus transmitted by this email.
>
> On 12/29/2008 6:16 PM, Grant Ingersoll wrote:
>>
>> Hi Girish,
>>
>> Can you provide some sample code and info about what isn't  
>> working?  All you have said so far is that the Arabic Analyzer  
>> doesn't work for you, but you have said nothing about how you are  
>> actually using it.  Are you getting exceptions?  Do the tokens not  
>> look right?  Are no results coming back?  Have you looked at your  
>> index in Luke?
>>
>> I'm going to take a wild stab in the dark and guess that you are  
>> not reading in the input in the right encoding.
>>
>> -Grant
>>
>> On Dec 29, 2008, at 7:19 AM, Girish Naik wrote:
>>
>>> Hi,
>>>     I am having a hard time in indexing the Arabic content and  
>>> searching the same via Lucene. I have also used a Arabic Analyzer  
>>> from the Lucene package but had no luck. I have also used a  
>>> snowball jar but it doesn't contain an Arabic stemmer. So i had  
>>> put the Lucene Arabic Stemmer in snowball jar (with  
>>> modifications  :-X ) but still have not got any luck so far.
>>>
>>>     Moreover when i dont use any stemmers/ analyzer the search  
>>> works perfectly on a Linux systems, but with Windows system the  
>>> search does not work at all =-O
>>>
>>> If anybody has any kind of solution or ideas, then please send it  
>>> across. I will be very happy to implement and test them.
>>>
>>> Thanks in Advance.
>>>
>>> -- 
>>>
>>>
>>> Regards,
>>>
>>> Please do not print this email unless it is absolutely necessary.
>>> Girish Naik
>>> Development Lead
>>>
>>> Neev Information Technologies Pvt Ltd
>>> Bangalore, Karnataka India
>>>
>>> <banner-5b.png>    Mobile: 91 09740091638
>>> Email: girish.naik@gmail.com
>>> IM: girish.naik (Skype)
>>> http://www.linkedin.com/in/girishnaik
>>>
>>> <banner-2c.png>    <neev_logo.gif>
>>>
>>> Fools rush in where angels fear to tread
>>> See who we know in common    Want a signature like this?
>>> The information contained in this electronic message and any  
>>> attachments to this message are intended for the exclusive use of  
>>> the addressee(s) and may contain proprietary, confidential or  
>>> privileged information. If you are not the intended recipient, you  
>>> should not disseminate, distribute or copy this e-mail. Please  
>>> notify the sender immediately and destroy all copies of this  
>>> message and any attachments.
>>> WARNING: Computer viruses can be transmitted via email. The  
>>> recipient should check this email and any attachments for the  
>>> presence of viruses. The company accepts no liability for any  
>>> damage caused by any virus transmitted by this email.
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>> --------------------------
>> Grant Ingersoll
>>
>> Lucene Helpful Hints:
>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ











Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message