lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Bethard (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2420) "fdx size mismatch" overflow causes RuntimeException
Date Thu, 29 Apr 2010 22:55:54 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862463#action_12862463
] 

Steven Bethard commented on LUCENE-2420:
----------------------------------------

I finally found the documentation saying that the maximum number of documents is ~274 billion:
  http://lucene.apache.org/java/3_0_1/fileformats.html

Google queries that failed to find this:
  lucene index maximum documents
  lucene document limit
  lucene max docs

Maybe a bullet could be added to the FAQ (which does turn up for most of these queries)?
  http://wiki.apache.org/lucene-java/LuceneFAQ

As far as the exception goes, regardless of the transaction semantics, I really don't think
the code works correctly after numeric overflow. Once SegmentWriteState.numDocsInStore is
negative, I would expect code like StoredFieldsWriter.flush to fail:

  synchronized public void flush(SegmentWriteState state) throws IOException {
    if (state.numDocsInStore > 0) {
      ...

Perhaps I'm wrong, but it seems like this is going to do the wrong thing when SegmentWriteState.numDocsInStore
is negative. If I'm not wrong, then it seems sensible to me to raise an exception on numeric
overflow.

> "fdx size mismatch" overflow causes RuntimeException
> ----------------------------------------------------
>
>                 Key: LUCENE-2420
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2420
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 3.0.1
>         Environment: CentOS 5.4
>            Reporter: Steven Bethard
>
> I just saw the following error:
> java.lang.RuntimeException: after flush: fdx size mismatch: -512764976 docs vs 30257618564
length in bytes of _0.fdx file exists?=true
>         at org.apache.lucene.index.StoredFieldsWriter.closeDocStore(StoredFieldsWriter.java:97)
>         at org.apache.lucene.index.DocFieldProcessor.closeDocStore(DocFieldProcessor.java:51)
>         at org.apache.lucene.index.DocumentsWriter.closeDocStore(DocumentsWriter.java:371)
>         at org.apache.lucene.index.IndexWriter.flushDocStores(IndexWriter.java:1724)
>         at org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3565)
>         at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3491)
>         at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3482)
>         at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1658)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1621)
>         at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1585)
> Note the negative SegmentWriteState.numDocsInStore. I assume this is because Lucene has
a limit of 2 ^ 31 - 1 = 2147483647 (sizeof(int)) documents per index, though I couldn't find
this documented clearly anywhere. It would have been nice to get this error earlier, back
when I exceeded the limit, rather than now, after a bunch of indexing that was apparently
doomed to fail.
> Hence, two suggestions:
> * State clearly somewhere that the maximum number of documents in a Lucene index is sizeof(int).
> * Throw an exception when an IndexWriter first exceeds this number rather than only on
close.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message