lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Corrupt index (IndexOutOfBoundsException)
Date Tue, 24 Mar 2009 11:39:39 GMT
Instead of ignoring the exceptions in your finally clause, can you log
them?  It could be something interesting is happening in there...

I'll have a look at the index.

Mike

"René Zöpnek" <Zoeppi@gmx.de> wrote:
> Thanks for your answer, Mike.
>
> Unfortunately I have no direct access to the server with the corrupt index. So changing
the creation process of the index is not possible.
>
> I've uploaded the index to http://drop.io/hlu53sl (9 MB).
>
>
>
> Here is the code for creating the index:
>
> public static void createIndex()
> {
>  log.info("create index");
>  long start = System.currentTimeMillis();
>  IndexWriter index = null;
>  InitialContext ic = null;
>  Connection connect = null;
>  PreparedStatement query = null;
>  ResultSet result = null;
>  try{
>          //create index
>          index = new IndexWriter("/var/content/index", getAnalyzer(), true);
>
>          //get content data
>          ic = new InitialContext();
>          javax.sql.DataSource source = (javax.sql.DataSource) ic.lookup("java:/ContentDS");
>          connect = source.getConnection();
>          query = connect.prepareStatement("SELECT DISTINCT C.* FROM TAB_CONTENT
C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT C.STORAGE = 'CONTAINER'");
>          result = query.executeQuery();
>
>          while(result.next())
>          {
>                  // map file info
>                  TabContentData data = TabContentMapper.getMapped(result);
>                  // index metadata
>                  try{
>                          indexMetadata(data, index);
>                  }catch(Exception e)
>                  {
>                          log.error("Failed to index "+data.getFileId()+"
with id "+data.getContentId(),e);
>                  }
>          }
>          log.info("indexing done");
>  }catch(Exception e)
>  {
>         log.error("create index failed",e);
>  }
>  finally
>  {
>         //clean up
>         try{ result.close(); }catch(Exception e){};
>         try{ query.close(); }catch(Exception e){};
>         try{ connect.close(); }catch(Exception e){};
>         try{ ic.close(); }catch(Exception e){};
>         try{ index.optimize(); }catch(Exception e){};
>         try{ index.close(); }catch(Exception e){};
>  }
> }
>
> The indexMetadata(data, index); method just maps the column names and the column contents
of one content into a lucene document which is then added to the index.
>
>
> If you have any further questions, don't hesitate to ask and thank you for your help.
>
> Greetz!
> René
>
>
>
> Michael McCandless schrieb:
>>
>> Something appears to be wrong with your _X.tii file (inside the compound file).
>>
>> Can you post the code that recreates this broken index?
>>
>> Since it appears to be repeatable, could you regenerate your index with compound
file off, confirm the problem still happens, and then post the _X.tii file?  I can try to
look at it.
>>
>> Mike
>>
>> René Zöpnek wrote:
>>
>>> Hello,
>>>
>>> I'm using Lucene 2.3.2 and had no problems untill now.
>>>
>>> But now I got an corrupt index. When searching, a java.lang.OutOfMemoryError
is thrown. I've wrote the following test program:
>>>
>>> private static void search(String index, String query) throws CorruptIndexException,
IOException, ParseException
>>> {
>>>     IndexReader reader = IndexReader.open(index);
>>>     //reader.setTermInfosIndexDivisor(10);
>>>     Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
>>>     Iterator it = col.iterator();
>>>     String[] fields = new String[col.size()];
>>>     int i = 0;
>>>     while(it.hasNext())
>>>     {
>>>         fields[i] = (String)it.next();
>>>         System.out.println("field["+i+"]: "+fields[i]);
>>>         i++;
>>>     }
>>>     Analyzer analyzer = new StandardAnalyzer();
>>>     MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
>>>     parser.setAllowLeadingWildcard(true);
>>>     Query quer = parser.parse(query);
>>>     System.out.println("Query: "+quer.toString());
>>>     quer = quer.rewrite(reader);
>>>     System.out.println("rewritten Query: "+quer.toString());
>>>     reader.close();
>>> }
>>>
>>> If reader.setTermInfosIndexDivisor() is commented out, the stacktrace looks like
this:
>>>
>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>     at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
>>>     at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
>>>     at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
>>>     at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
>>>     at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
>>>     at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
>>>     at diplom.lucene.Index.search(Index.java:59)
>>>     at diplom.lucene.Index.main(Index.java:28)
>>>
>>> The index has a size of 59 MB, so it is weird to get an OutOfMemoryException.
So with reader.setTermInfosIndexDivisor() set to 10, the stacktrace looks like:
>>>
>>> java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
>>>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>>     at java.util.ArrayList.get(ArrayList.java:322)
>>>     at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
>>>     at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
>>>     at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
>>>     at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
>>>     at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
>>>     at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
>>>     at org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
>>>     at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
>>>     at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
>>>     at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
>>>     at diplom.lucene.Index.search(Index.java:59)
>>>     at diplom.lucene.Index.main(Index.java:28)
>>>
>>>
>>> CheckIndex prints the following:
>>>
>>> Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE [Lucene
2.3]
>>>  1 of 1: name=_5pa8 docCount=117378
>>>    compound=true
>>>    numFiles=1
>>>    size (MB)=57,573
>>>    no deletions
>>>    test: open reader.........OK
>>>    test: fields, norms.......OK [54 fields]
>>>    test: terms, freq, prox...FAILED
>>>    WARNING: would remove reference to this segment (-fix was not specified);
full exception:
>>> java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
>>>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>>     at java.util.ArrayList.get(ArrayList.java:322)
>>>     at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
>>>     at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
>>>     at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
>>>     at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
>>>     at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
>>>     at diplom.lucene.Index.check(Index.java:67)
>>>     at diplom.lucene.Index.main(Index.java:28)
>>>
>>> WARNING: 1 broken segments detected
>>> WARNING: 117378 documents would be lost if -fix were specified
>>>
>>> NOTE: would write new segments file [-fix was not specified]
>>>
>>> Index correct: false
>>>
>>>
>>>
>>> Recreating the index didn't solve the problem. And I have no idea for solving
it, so every help is greatly appreciated.
>>>
>>> Thanks in advance.
>>> Rene
>>> --
>>> Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
>>> Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
> --
> Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
> Jetzt absichern: https://homebanking.gmx.net/?mc=mail@footer.hb
>
> --
> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message