Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (nike.apache.org: domain of Zoeppi@gmx.de designates
 213.165.64.20 as permitted sender)
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="iso-8859-1"
Date: Tue, 24 Mar 2009 09:03:24 +0100
From: =?iso-8859-1?Q?=22Ren=E9_Z=F6pnek=22?= <Zoeppi@gmx.de>
Message-ID: <20090324080324.237310@gmx.net>
MIME-Version: 1.0
Subject: Corrupt index (IndexOutOfBoundsException)
To: java-user@lucene.apache.org

Thanks for your answer, Mike.

Unfortunately I have no direct access to the server with the corrupt index. So changing the creation process of the index is not possible. 

I've uploaded the index to http://drop.io/hlu53sl (9 MB).


Here is the code for creating the index:

public static void createIndex()
{
  log.info("create index");
  long start = System.currentTimeMillis();
  IndexWriter index = null;
  InitialContext ic = null;
  Connection connect = null;
  PreparedStatement query = null;
  ResultSet result = null;
  try{
	  //create index
	  index = new IndexWriter("/var/content/index", getAnalyzer(), true);
 
	  //get content data 
	  ic = new InitialContext();
	  javax.sql.DataSource source = (javax.sql.DataSource) ic.lookup("java:/ContentDS");
	  connect = source.getConnection();
	  query = connect.prepareStatement("SELECT DISTINCT C.* FROM TAB_CONTENT C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT C.STORAGE = 'CONTAINER'");
	  result = query.executeQuery();
		  
	  while(result.next())
	  {
		  // map file info
		  TabContentData data = TabContentMapper.getMapped(result);
		  // index metadata
		  try{
			  indexMetadata(data, index);
		  }catch(Exception e)
		  {
			  log.error("Failed to index "+data.getFileId()+" with id "+data.getContentId(),e);
		  }
	  }
	  log.info("indexing done");
 }catch(Exception e)
 {
	 log.error("create index failed",e);
 }
 finally
 {
	 //clean up
	 try{ result.close(); }catch(Exception e){};
	 try{ query.close(); }catch(Exception e){};
	 try{ connect.close(); }catch(Exception e){};
	 try{ ic.close(); }catch(Exception e){};			 
	 try{ index.optimize(); }catch(Exception e){};
	 try{ index.close(); }catch(Exception e){};
 }		  
}

The indexMetadata(data, index); method just maps the column names and the column contents of one content into a lucene document which is then added to the index.

 
If you have any further questions, don't hesitate to ask and thank you for your help.

Greetz!
Ren�


Michael McCandless schrieb:
>
> Something appears to be wrong with your _X.tii file (inside the compound file).
>
> Can you post the code that recreates this broken index?
>
> Since it appears to be repeatable, could you regenerate your index with compound file off, confirm the problem still happens, and then post the _X.tii file?  I can try to look at it.
>
> Mike
>
> Ren� Z�pnek wrote:
>
>> Hello,
>>
>> I'm using Lucene 2.3.2 and had no problems untill now.
>>
>> But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError is thrown. I've wrote the following test program:
>>
>> private static void search(String index, String query) throws CorruptIndexException, IOException, ParseException
>> {
>>     IndexReader reader = IndexReader.open(index);
>>     //reader.setTermInfosIndexDivisor(10);
>>     Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
>>     Iterator it = col.iterator();
>>     String[] fields = new String[col.size()];
>>     int i = 0;
>>     while(it.hasNext())
>>     {
>>         fields[i] = (String)it.next();
>>         System.out.println("field["+i+"]: "+fields[i]);
>>         i++;           
>>     }
>>     Analyzer analyzer = new StandardAnalyzer();
>>     MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
>>     parser.setAllowLeadingWildcard(true);
>>     Query quer = parser.parse(query);
>>     System.out.println("Query: "+quer.toString());
>>     quer = quer.rewrite(reader);
>>     System.out.println("rewritten Query: "+quer.toString());
>>     reader.close();
>> }
>>
>> If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like this:
>>
>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>     at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
>>     at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
>>     at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
>>     at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
>>     at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
>>     at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
>>     at diplom.lucene.Index.search(Index.java:59)
>>     at diplom.lucene.Index.main(Index.java:28)
>>
>> The index has a size of 59 MB, so it is weird to get an OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like:
>>
>> java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
>>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>     at java.util.ArrayList.get(ArrayList.java:322)
>>     at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
>>     at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
>>     at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
>>     at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
>>     at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
>>     at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
>>     at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
>>     at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
>>     at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
>>     at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
>>     at diplom.lucene.Index.search(Index.java:59)
>>     at diplom.lucene.Index.main(Index.java:28)
>>
>>
>> CheckIndex prints the following:
>>
>> Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene 2.3]
>>  1 of 1: name=_5pa8 docCount=117378
>>    compound=true
>>    numFiles=1
>>    size (MB)=57,573
>>    no deletions
>>    test: open reader.........OK
>>    test: fields, norms.......OK [54 fields]
>>    test: terms, freq, prox...FAILED
>>    WARNING: would remove reference to this segment (-fix was not specified); full exception:
>> java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
>>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>     at java.util.ArrayList.get(ArrayList.java:322)
>>     at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
>>     at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
>>     at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
>>     at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
>>     at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
>>     at diplom.lucene.Index.check(Index.java:67)
>>     at diplom.lucene.Index.main(Index.java:28)
>>
>> WARNING: 1 broken segments detected
>> WARNING: 117378 documents would be lost if -fix were specified
>>
>> NOTE: would write new segments file [-fix was not specified]
>>
>> Index correct: false
>>
>>
>>
>> Recreating the index didn't solve the problem. And I have no idea for solving it, so every help is greatly appreciated.
>>
>> Thanks in advance.
>> Rene
>> -- 
>> Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal gesch�tzt?
>> Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

-- 
Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal gesch�tzt?
Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb

-- 
Psssst! Schon vom neuen GMX MultiMessenger geh�rt? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org