lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Grishin" <gris...@softline.kiev.ua>
Subject Re: problems with search on Russian content
Date Mon, 25 Nov 2002 13:15:27 GMT
I got the noghtly build from the CVS

When I am trying to use IndexWriter this way:
writer = new IndexWriter(indexDirectory, new
RussianAnalyzer("Cp1251".toCharArray()), true);
I got the following exception
----------------------------------------------------------------------------
-----------------------
java.lang.ArrayIndexOutOfBoundsException: 7
	at
org.apache.lucene.analysis.ru.RussianAnalyzer.makeStopWords(RussianAnalyzer.
java:521)
	at org.apache.lucene.analysis.ru.RussianAnalyzer.(RussianAnalyzer.java:473)
----------------------------------------------------------------------------
-----------------------


When I am trying to use it this way:
writer = new IndexWriter(indexDirectory, new
RussianAnalyzer("Cp1251".toCharArray(), new String[] {}), true);
I got the following exception
----------------------------------------------------------------------------
-----------------------
2002-11-25 15:09:09,044
[ua.kiev.softline.services.searcher.index.PublishingIndexerImpl]
INFO   - --Throwable in addArticle(): java.lang.ArrayIndexOut
OfBoundsException: 8
java.lang.ArrayIndexOutOfBoundsException: 8
        at
org.apache.lucene.analysis.ru.RussianStemmer.isVowel(RussianStemmer.java:991
)
        at
org.apache.lucene.analysis.ru.RussianStemmer.markPositions(RussianStemmer.ja
va:909)
        at
org.apache.lucene.analysis.ru.RussianStemmer.stem(RussianStemmer.java:1551)
        at
org.apache.lucene.analysis.ru.RussianStemFilter.next(RussianStemFilter.java:
189)
        at
org.apache.lucene.index.DocumentWriter.invertDocument(DocumentWriter.java:17
0)
        at
org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:111)
        at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:209)
        at
ua.kiev.softline.services.searcher.index.PublishingIndexerImpl.addArticle(Pu
blishingIndexerImpl.java:130)
----------------------------------------------------------------------------
-----------------------

When I commented line 575 in RussianAnalyzer.java
result = new RussianStemFilter(result, charset);
everything works fine - I can search (and find :)) russian words...

Am I doing something wrong?

Regards, Andrey



----- Original Message -----
From: "Otis Gospodnetic" <otis_gospodnetic@yahoo.com>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, November 21, 2002 5:35 PM
Subject: Re: problems with search on Russian content


> Look at CHANGES.txt document in CVS - there is some new stuff in
> org.apache.lucene.analysis.ru package that you will want to use.
> Get the Lucene from the nightly build...
>
> Otis
>
> --- Andrey Grishin <grishin@softline.kiev.ua> wrote:
> > Hi All,
> > I have a problems with searching on Russian content using lucene 1.2
> >
> > I indexed the content using Cp1251 charset
> > ------------
> > text = new String(text.getBytes("Cp1251"));
> > doc.add(Field.Text(CONTENT_FIELD,text));
> >
> > ------------
> > and I am searching using the same charset
> >
> > String txt = "áÎÄ";
> > txt = new String(txt.getBytes("Cp1251"));
> > PrefixQuery query = new PrefixQuery(new
> > Term(PortalHTMLDocument.CONTENT_FIELD, txt));
> > hits = searcher.search(query);
> >
> > or
> >
> > Analyzer analyzer = new StandardAnalyzer();
> > String txt = "áÎÄÒÅÊ";
> > txt = new String(txt.getBytes("Cp1251"));
> > Query query = QueryParser.parse(txt,
> > PortalHTMLDocument.CONTENT_FIELD, analyzer);
> >
> > hits = searcher.search(query);
> >
> >
> > and lucene can't find nothing.
> > Also I checked for the DecodeInterceptor in my server.xml - there
> > isn't any
> >
> > I tried UTF-8/16 - and got the same result.
> >
> > Also, if I list all index's content via iterating IndexReader - I can
> > see that my russian content is stored in index...
> > Can you please help me? Do you have any more ideas about what else
> > can be done here to fix this problem?
> >
> > I will appreciate any help.
> > Thanks, Andrey.
> >
> > P.S.
> > I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
<mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
<mailto:lucene-user-help@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message