Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 52769 invoked from network); 24 Mar 2009 08:03:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 24 Mar 2009 08:03:58 -0000 Received: (qmail 58451 invoked by uid 500); 24 Mar 2009 08:03:56 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 58355 invoked by uid 500); 24 Mar 2009 08:03:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 58342 invoked by uid 99); 24 Mar 2009 08:03:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Mar 2009 08:03:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Zoeppi@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 24 Mar 2009 08:03:45 +0000 Received: (qmail 8399 invoked by uid 0); 24 Mar 2009 08:03:24 -0000 Received: from 89.27.254.190 by www077.gmx.net with HTTP; Tue, 24 Mar 2009 09:03:24 +0100 (CET) Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset="iso-8859-1" Date: Tue, 24 Mar 2009 09:03:24 +0100 From: =?iso-8859-1?Q?=22Ren=E9_Z=F6pnek=22?= Message-ID: <20090324080324.237310@gmx.net> MIME-Version: 1.0 Subject: Corrupt index (IndexOutOfBoundsException) To: java-user@lucene.apache.org X-Authenticated: #13613023 X-Flags: 0001 X-Mailer: WWW-Mail 6100 (Global Message Exchange) X-Priority: 3 X-Provags-ID: V01U2FsdGVkX1/6hHwzG3f6SHNPtDwUHGWbPQgnYsB67PPqDotNEC WxyUVRSjQIqUOOjjo+sem9MvmSIgVj4oPGiQ== X-GMX-UID: +qb6cYRvPjl+cOqD+TU2B887MTE2NUns X-FuHaFi: 0.42 X-Virus-Checked: Checked by ClamAV on apache.org Thanks for your answer, Mike. Unfortunately I have no direct access to the server with the corrupt index. So changing the creation process of the index is not possible. I've uploaded the index to http://drop.io/hlu53sl (9 MB). Here is the code for creating the index: public static void createIndex() { log.info("create index"); long start = System.currentTimeMillis(); IndexWriter index = null; InitialContext ic = null; Connection connect = null; PreparedStatement query = null; ResultSet result = null; try{ //create index index = new IndexWriter("/var/content/index", getAnalyzer(), true); //get content data ic = new InitialContext(); javax.sql.DataSource source = (javax.sql.DataSource) ic.lookup("java:/ContentDS"); connect = source.getConnection(); query = connect.prepareStatement("SELECT DISTINCT C.* FROM TAB_CONTENT C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT C.STORAGE = 'CONTAINER'"); result = query.executeQuery(); while(result.next()) { // map file info TabContentData data = TabContentMapper.getMapped(result); // index metadata try{ indexMetadata(data, index); }catch(Exception e) { log.error("Failed to index "+data.getFileId()+" with id "+data.getContentId(),e); } } log.info("indexing done"); }catch(Exception e) { log.error("create index failed",e); } finally { //clean up try{ result.close(); }catch(Exception e){}; try{ query.close(); }catch(Exception e){}; try{ connect.close(); }catch(Exception e){}; try{ ic.close(); }catch(Exception e){}; try{ index.optimize(); }catch(Exception e){}; try{ index.close(); }catch(Exception e){}; } } The indexMetadata(data, index); method just maps the column names and the column contents of one content into a lucene document which is then added to the index. If you have any further questions, don't hesitate to ask and thank you for your help. Greetz! Ren� Michael McCandless schrieb: > > Something appears to be wrong with your _X.tii file (inside the compound file). > > Can you post the code that recreates this broken index? > > Since it appears to be repeatable, could you regenerate your index with compound file off, confirm the problem still happens, and then post the _X.tii file? I can try to look at it. > > Mike > > Ren� Z�pnek wrote: > >> Hello, >> >> I'm using Lucene 2.3.2 and had no problems untill now. >> >> But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program: >> >> private static void search(String index, String query) throws CorruptIndexException, IOException, ParseException >> { >> IndexReader reader = IndexReader.open(index); >> //reader.setTermInfosIndexDivisor(10); >> Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED); >> Iterator it = col.iterator(); >> String[] fields = new String[col.size()]; >> int i = 0; >> while(it.hasNext()) >> { >> fields[i] = (String)it.next(); >> System.out.println("field["+i+"]: "+fields[i]); >> i++; >> } >> Analyzer analyzer = new StandardAnalyzer(); >> MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer); >> parser.setAllowLeadingWildcard(true); >> Query quer = parser.parse(query); >> System.out.println("Query: "+quer.toString()); >> quer = quer.rewrite(reader); >> System.out.println("rewritten Query: "+quer.toString()); >> reader.close(); >> } >> >> If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like this: >> >> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space >> at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155) >> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202) >> at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277) >> at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643) >> at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42) >> at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385) >> at diplom.lucene.Index.search(Index.java:59) >> at diplom.lucene.Index.main(Index.java:28) >> >> The index has a size of 59 MB, so it is weird to get an OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like: >> >> java.lang.IndexOutOfBoundsException: Index: 103, Size: 54 >> at java.util.ArrayList.RangeCheck(ArrayList.java:547) >> at java.util.ArrayList.get(ArrayList.java:322) >> at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) >> at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249) >> at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68) >> at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123) >> at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159) >> at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202) >> at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277) >> at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643) >> at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42) >> at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385) >> at diplom.lucene.Index.search(Index.java:59) >> at diplom.lucene.Index.main(Index.java:28) >> >> >> CheckIndex prints the following: >> >> Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3] >> 1 of 1: name=_5pa8 docCount=117378 >> compound=true >> numFiles=1 >> size (MB)=57,573 >> no deletions >> test: open reader.........OK >> test: fields, norms.......OK [54 fields] >> test: terms, freq, prox...FAILED >> WARNING: would remove reference to this segment (-fix was not specified); full exception: >> java.lang.IndexOutOfBoundsException: Index: 110, Size: 54 >> at java.util.ArrayList.RangeCheck(ArrayList.java:547) >> at java.util.ArrayList.get(ArrayList.java:322) >> at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260) >> at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249) >> at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68) >> at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123) >> at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182) >> at diplom.lucene.Index.check(Index.java:67) >> at diplom.lucene.Index.main(Index.java:28) >> >> WARNING: 1 broken segments detected >> WARNING: 117378 documents would be lost if -fix were specified >> >> NOTE: would write new segments file [-fix was not specified] >> >> Index correct: false >> >> >> >> Recreating the index didn't solve the problem. And I have no idea for solving it, so every help is greatly appreciated. >> >> Thanks in advance. >> Rene >> -- >> Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal gesch�tzt? >> Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > -- Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal gesch�tzt? Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb -- Psssst! Schon vom neuen GMX MultiMessenger geh�rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org