lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jack Krupansky" <j...@basetechnology.com>
Subject Re: Too many documents Exception
Date Mon, 12 May 2014 20:27:22 GMT
One of the hard-core Lucene guys is going to have to help you out. Or you 
may have to write some custom code to fix the index for any such shard. If 
you have deleted any documents, it may be sufficient to simply optimize the 
index.

-- Jack Krupansky

-----Original Message----- 
From: yamazaki
Sent: Wednesday, May 7, 2014 8:15 PM
To: solr-user@lucene.apache.org
Subject: Re: Too many documents Exception

Tanks, Jack.

Is there a way to suppress setting this exception?

For example,
<maxMergeDocs>2147483647</maxMergeDocs> ?


When this exception occurs, Index will not be read.
If solrcloud  is used, some data not read.

shard1 documents 2^31-1 over
shard2 documents 2^31-1 not over

shard1 down. shard1 index is dead.

-- yamazaki


2014-05-07 11:01 GMT+09:00 Jack Krupansky <jack@basetechnology.com>:
> Lucene only supports 2^31-1 documents in an index, so Solr can only 
> support
> 2^31-1 documents in a single shard.
>
> I think it's a bug that Lucene doesn't throw an exception when more than
> that number of documents have been inserted. Instead, you get this error
> when Solr tries to read such an overstuffed index.
>
> -- Jack Krupansky
>
> -----Original Message----- From: [Tech Fun]山崎
> Sent: Tuesday, May 6, 2014 8:54 PM
> To: solr-user@lucene.apache.org
> Subject: Too many documents Exception
>
>
> Hello everybody,
>
> Solr 4.3.1(and 4.7.1), Num Docs + Deleted Docs >
> 2147483647(Integer.MAX_VALUE) over
> Caused by: java.lang.IllegalArgumentException: Too many documents,
> composite IndexReaders cannot exceed 2147483647
>
> It seems to be trouble similar to the unresolved e-mail.
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201307.mbox/browser
>
> If How can I fix this?
> This Solr Specification?
>
>
> log.
>
> ERROR org.apache.solr.core.CoreContainer  – Unable to create core:
> collection1
> org.apache.solr.common.SolrException: Error opening new searcher
>    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
>    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618)
>    at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
>    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
>    at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
>    at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>    at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.solr.common.SolrException: Error opening new 
> searcher
>    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1438)
>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1550)
>    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:796)
>    ... 13 more
> Caused by: org.apache.solr.common.SolrException: Error opening Reader
>    at
> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
>    at
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183)
>    at
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179)
>    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1414)
>    ... 15 more
> Caused by: java.lang.IllegalArgumentException: Too many documents,
> composite IndexReaders cannot exceed 2147483647
>    at
> org.apache.lucene.index.BaseCompositeReader.<init>(BaseCompositeReader.java:77)
>    at
> org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:368)
>    at
> org.apache.lucene.index.StandardDirectoryReader.<init>(StandardDirectoryReader.java:42)
>    at
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:71)
>    at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
>    at
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
>    at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
>    at
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
>    at
> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169)
>    ... 18 more
> ERROR org.apache.solr.core.CoreContainer  –
> null:org.apache.solr.common.SolrException: Unable to create core:
> collection1
>    at
> org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1450)
>    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:993)
>    at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597)
>    at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>    at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>    at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.solr.common.SolrException: Error opening new 
> searcher
>    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
>    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:618)
>    at
> org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
>    at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
>    ... 10 more
> Caused by: org.apache.solr.common.SolrException: Error opening new 
> searcher
>    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1438)
>    at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1550)
>    at org.apache.solr.core.SolrCore.<init>(SolrCore.java:796)
>    ... 13 more
> Caused by: org.apache.solr.common.SolrException: Error opening Reader
>    at
> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172)
>    at
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:183)
>    at
> org.apache.solr.search.SolrIndexSearcher.<init>(SolrIndexSearcher.java:179)
>    at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1414)
>    ... 15 more
> Caused by: java.lang.IllegalArgumentException: Too many documents,
> composite IndexReaders cannot exceed 2147483647
>    at
> org.apache.lucene.index.BaseCompositeReader.<init>(BaseCompositeReader.java:77)
>    at
> org.apache.lucene.index.DirectoryReader.<init>(DirectoryReader.java:368)
>    at
> org.apache.lucene.index.StandardDirectoryReader.<init>(StandardDirectoryReader.java:42)
>    at
> org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:71)
>    at
> org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783)
>    at
> org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
>    at 
> org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88)
>    at
> org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
>    at
> org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:169)
>    ... 18 more
>
>
> sample solrconfig.xml
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <config>
>  <luceneMatchVersion>LUCENE_43</luceneMatchVersion>
>
>  <lib dir="/opt/solr/dist" regex="solr-cell-\d.*\.jar" />
>  <lib dir="/opt/solr/contrib/extraction/lib" regex=".*\.jar" />
>
>  <lib dir="/opt/solr/dist" regex="solr-clustering-\d.*\.jar" />
>  <lib dir="/opt/solr/contrib/clustering/lib" regex=".*\.jar" />
>
>  <lib dir="/opt/solr/dist" regex="solr-langid-\d.*\.jar" />
>  <lib dir="/opt/solr/contrib/langid/lib" regex=".*\.jar" />
>
>  <lib dir="/opt/solr/dist" regex="solr-velocity-\d.*\.jar" />
>  <lib dir="/opt/solr/contrib/velocity/lib" regex=".*\.jar" />
>
>  <dataDir>${solr.data.dir:}</dataDir>
>
>  <directoryFactory name="DirectoryFactory"
>
> class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}"/>
>
>  <codecFactory class="solr.SchemaCodecFactory"/>
>
>  <indexConfig>
>    <ramBufferSizeMB>256</ramBufferSizeMB>
>    <lockType>${solr.lock.type:native}</lockType>
>  </indexConfig>
>
>  <jmx />
>
>  <updateHandler class="solr.DirectUpdateHandler2">
>    <updateLog>
>      <str name="dir">${solr.ulog.dir:}</str>
>    </updateLog>
>    <autoCommit>
>      <maxDocs>10000</maxDocs>
>      <maxTime>60000</maxTime>
>      <openSearcher>false</openSearcher>
>    </autoCommit>
>    <autoSoftCommit>
>      <maxDocs>10</maxDocs>
>      <maxTime>1000</maxTime>
>    </autoSoftCommit>
>  </updateHandler>
>
>  <query>
>    <maxBooleanClauses>1024</maxBooleanClauses>
>    <filterCache class="solr.FastLRUCache"
>                 size="16384"
>                 initialSize="4096"
>                 autowarmCount="1024"/>
>    <queryResultCache class="solr.FastLRUCache"
>                     size="16384"
>                     initialSize="4096"
>                     autowarmCount="1024"/>
>    <documentCache class="solr.FastLRUCache"
>                   size="16384"
>                   initialSize="4096"
>                   autowarmCount="1024"/>
>    <enableLazyFieldLoading>true</enableLazyFieldLoading>
>    <queryResultWindowSize>20</queryResultWindowSize>
>    <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
>    <useColdSearcher>false</useColdSearcher>
>    <maxWarmingSearchers>2</maxWarmingSearchers>
>  </query>
>
>  <requestDispatcher handleSelect="false" >
>    <requestParsers enableRemoteStreaming="true"
>                    multipartUploadLimitInKB="2048000"
>                    formdataUploadLimitInKB="2048"/>
>    <httpCaching never304="true" />
>  </requestDispatcher>
>
>  <requestHandler name="/select" class="solr.SearchHandler">
>    <lst name="defaults">
>       <str name="echoParams">explicit</str>
>       <int name="rows">10</int>
>       <str name="df">text</str>
>    </lst>
>  </requestHandler>
>
>  <requestHandler name="/update" class="solr.UpdateRequestHandler">
>  </requestHandler>
>
>  <requestHandler name="/update/json" 
> class="solr.JsonUpdateRequestHandler">
>    <lst name="defaults">
>      <str name="stream.contentType">application/json</str>
>    </lst>
>  </requestHandler>
>
>  <requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
>
>  <requestHandler name="/admin/ping" class="solr.PingRequestHandler">
>    <lst name="invariants">
>      <str name="q">solrpingquery</str>
>    </lst>
>    <lst name="defaults">
>      <str name="echoParams">all</str>
>    </lst>
>  </requestHandler>
>
>  <queryResponseWriter name="json" class="solr.JSONResponseWriter">
>    <str name="content-type">text/plain; charset=UTF-8</str>
>  </queryResponseWriter>
> </config>
>
>
> sample scheme.xml
>
> <?xml version="1.0" encoding="UTF-8" ?>
> <schema name="twitter" version="1.5">
>  <!-- types -->
>  <types>
>    <fieldType name="string" class="solr.StrField" sortMissingLast="true" 
> />
>    <fieldType name="long" class="solr.TrieLongField"
> precisionStep="0" positionIncrementGap="0"/>
>    <fieldType name="tlong" class="solr.TrieLongField"
> precisionStep="8" positionIncrementGap="0"/>
>    <fieldType name="tdate" class="solr.TrieDateField"
> precisionStep="6" positionIncrementGap="0"/>
>    <fieldType name="text_cjk" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer>
>        <charFilter class="solr.MappingCharFilterFactory"/>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.CJKWidthFilterFactory"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.CJKBigramFilterFactory" outputUnigrams="true"/>
>      </analyzer>
>    </fieldType>
>  </types>
>
>  <!-- fields -->
>  <fields>
>    <field name="key" type="string" indexed="true" stored="true"
> required="true" />
>    <field name="status_id" type="tlong" indexed="true" stored="true"
> required="true"/>
>    <field name="text" type="text_cjk" indexed="true" stored="true"
> required="true"/>
>    <field name="from_user_id_str" type="string" indexed="true"
> stored="true" required="true"/>
>    <field name="created_at" type="tdate" indexed="true" stored="true"
> required="true"/>
>    <field name="_version_" type="long" indexed="true" stored="true"
> multiValued="false"/>
>  </fields>
>  <uniqueKey>key</uniqueKey>
>  <defaultSearchField>text</defaultSearchField>
>  <solrQueryParser defaultOperator="AND"/>
> </schema>
>
>
>
> sample data add source code, python
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
> import datetime
> # use https://github.com/toastdriven/pysolr
> from pysolr import(
>    Solr,
> )
>
>
> def main():
>    s_time = datetime.datetime.utcnow()
>    print 'start.: ({})'.format(str(s_time))
>
>    solr = Solr('http://localhost:8983/solr/collection1', timeout=60)
>
>    docs = []
>
>    max_range = 22 * (10 ** 8)  # Java Integer.MAX_VALUE over
>    for x in xrange(1, max_range):
>        docs.append(
>            {
>                'key': '{}'.format(x),
>                'status_id': x,
>                'text': '{} 番目の記事'.format(x).decode('utf-8'),
>                'from_user_id_str': '1',
>                'created_at': '2014-05-01T20:06:53Z',
>            }
>        )
>
>        if x % (10 ** 4) == 0:
>            solr.add(docs)
>            solr.commit()
>            docs = []
>
>            e_time = datetime.datetime.utcnow()
>
>            print '{} end.: ({})'.format(x, str(e_time - s_time))
>
>    solr.add(docs)
>    solr.commit()
>    docs = []
>
>    e_time = datetime.datetime.utcnow()
>
>    print 'end.: ({})'.format(str(e_time - s_time))
>
> if __name__ == '__main__':
>    main()



-- 
----
山崎 一大 Tech Fun 株式会社
mailto:yamazaki@techfun.jp
〒110-0015 東京都台東区東上野1-7-15 野村不動産東上野ビル3階
TEL: 03-5816-0331(代)  FAX: 03-5816-0332
会社Web: http://techfun.co.jp/
教育Web: http://techfun.jp/ 


Mime
View raw message