lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Girish Naik <gir...@neevtech.com>
Subject Re: Help in Arabic Analysers with Lucene on Windows
Date Mon, 29 Dec 2008 13:19:14 GMT
Sorry for that,

Here is how the Analyzer is Selected:

  public static Analyzer getAnalyzerInstance(String localeKey) {
     Analyzer analyzer = null;
     if (localeKey == null || localeKey.trim().equals("")) {
         localeKey = AppContext.getSetting("defaultLocale");
         System.out.println("<><><>><><><><><Locale
key taken as Default ");
     } else {
         // localeKey may be a csv of locales, in which case picj the first
         // one.
         localeKey = StringUtils.split(localeKey, ",")[0].trim();
         System.out.println("<><><>><><><><><Locale
key is trimmed");
     }
     System.out.println("<><><>><><><><><Locale
is " + localeKey);
     String name = (String) _analyzerMap.get(localeKey);
     System.out.println("<><><>><><><><><Name
from Locale is " + name);
     if (name == null) {
         analyzer = new StandardAnalyzer();
     } else {
         // if (name.equalsIgnoreCase("Arabic")) {
         // analyzer = new ArabicAnalyzer();
         // } else {
         analyzer = new SnowballAnalyzer(name);
         // }
     }
     return analyzer;
     }



While Indexing some are analyzed and some are not...

  document.add(new Field(FIELD_DOCUMENT_CREATED_ON, LocaleUtils
             .convert8859_6ToUTF8(com.aurigalogic.activesite.field.Field
                 .indexableDate(avsDoc.getCreatedOn())),
             Field.Store.YES, Field.Index.NOT_ANALYZED));
...
document.add(new Field(FIELD_CONTENT_TYPE, LocaleUtils
             .convert8859_6ToUTF8(version.getDocument()
                 .getContentDescriptor().getName()),
             Field.Store.YES, Field.Index.ANALYZED));

Currently the method /LocaleUtils.convert8859_6ToUTF8/ does nothing but 
returns the parameter as is.

While seraching the Query parser  etc.  are created like

Analyzer analyzer = AnalyzerSelector.getAnalyzerInstance(locale);
...
QueryParser qparser = new QueryParser(Constants.FIELD_BODY, analyzer);
...



So while posting the form with a Arabic word does not fetch the results. 
An English word does work though!!

I would be more that helpful if anything else is required.



------------------------------------------------------------------------

Regards,

Please do not print this email unless it is absolutely necessary.
*Girish Naik*
Development Lead

*Neev Information Technologies Pvt Ltd* <http://www.neevtech.com>
Bangalore, Karnataka India

Mozilla Store <http://www.spreadfirefox.com/node&id=182416&t=260> 
*Mobile:* 91 09740091638
*Email:* girish.naik@gmail.com <mailto:girish.naik@gmail.com>
*IM:* girish.naik (Skype)
*http://www.linkedin.com/in/girishnaik*

Mozilla Store <http://www.spreadfirefox.com/node&id=182416&t=262> 	Join 
Neev Information Technologies Private Limited Group in LinkedIn 
<http://www.linkedin.com/e/gis/68693/571D4D044006>

Fools rush in where angels fear to tread

See who we know in common <http://www.linkedin.com/e/wwk/4759877/> 	Want 
a signature like this? <http://www.linkedin.com/e/sig/4759877/>


------------------------------------------------------------------------
The information contained in this electronic message and any attachments 
to this message are intended for the exclusive use of the addressee(s) 
and may contain proprietary, confidential or privileged information. If 
you are not the intended recipient, you should not disseminate, 
distribute or copy this e-mail. Please notify the sender immediately and 
destroy all copies of this message and any attachments.
*WARNING:* Computer viruses can be transmitted via email. The recipient 
should check this email and any attachments for the presence of viruses. 
The company accepts no liability for any damage caused by any virus 
transmitted by this email.
------------------------------------------------------------------------

On 12/29/2008 6:16 PM, Grant Ingersoll wrote:
> Hi Girish,
>
> Can you provide some sample code and info about what isn't working?  
> All you have said so far is that the Arabic Analyzer doesn't work for 
> you, but you have said nothing about how you are actually using it.  
> Are you getting exceptions?  Do the tokens not look right?  Are no 
> results coming back?  Have you looked at your index in Luke?
>
> I'm going to take a wild stab in the dark and guess that you are not 
> reading in the input in the right encoding.
>
> -Grant
>
> On Dec 29, 2008, at 7:19 AM, Girish Naik wrote:
>
>> Hi,
>>     I am having a hard time in indexing the Arabic content and 
>> searching the same via Lucene. I have also used a Arabic Analyzer 
>> from the Lucene package but had no luck. I have also used a snowball 
>> jar but it doesn't contain an Arabic stemmer. So i had put the Lucene 
>> Arabic Stemmer in snowball jar (with modifications  :-X ) but still 
>> have not got any luck so far.
>>
>>     Moreover when i dont use any stemmers/ analyzer the search works 
>> perfectly on a Linux systems, but with Windows system the search does 
>> not work at all =-O
>>
>> If anybody has any kind of solution or ideas, then please send it 
>> across. I will be very happy to implement and test them.
>>
>> Thanks in Advance.
>>
>> -- 
>>
>>
>> Regards,
>>
>> Please do not print this email unless it is absolutely necessary.
>> Girish Naik
>> Development Lead
>>
>> Neev Information Technologies Pvt Ltd
>> Bangalore, Karnataka India
>>
>> <banner-5b.png>    Mobile: 91 09740091638
>> Email: girish.naik@gmail.com
>> IM: girish.naik (Skype)
>> http://www.linkedin.com/in/girishnaik
>>
>> <banner-2c.png> <neev_logo.gif>
>>
>> Fools rush in where angels fear to tread
>> See who we know in common    Want a signature like this?
>> The information contained in this electronic message and any 
>> attachments to this message are intended for the exclusive use of the 
>> addressee(s) and may contain proprietary, confidential or privileged 
>> information. If you are not the intended recipient, you should not 
>> disseminate, distribute or copy this e-mail. Please notify the sender 
>> immediately and destroy all copies of this message and any attachments.
>> WARNING: Computer viruses can be transmitted via email. The recipient 
>> should check this email and any attachments for the presence of 
>> viruses. The company accepts no liability for any damage caused by 
>> any virus transmitted by this email.
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message