lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From William Bell <billnb...@gmail.com>
Subject Re: How to use batchSize in DataImportHandler to throttle updates in a batch-mode
Date Sun, 01 Dec 2013 08:39:04 GMT
Well I think your issue is batchSize. batchSize="1" should be batchSize="-1"
I also recommend you use *readOnly="true"*


On Tue, Nov 26, 2013 at 1:50 AM, Dileepa Jayakody <dileepajayakody@gmail.com
> wrote:

> Hi All,
>
> I have a requirement to import a large amount of data from a mysql database
> and index documents (about 1000 documents).
> During indexing process I need to do a special processing of a field by
> sending a enhancement requests to an external Apache Stanbol server.
> I have configured my dataimport-handler in solrconfig.xml to use the
> StanbolContentProcessor in the update chain, as below;
>
>  *<updateRequestProcessorChain name="stanbolInterceptor">*
> * <processor
> class="com.solr.stanbol.processor.StanbolContentProcessorFactory"/>*
> *        <processor class="solr.RunUpdateProcessorFactory" />*
> *  </updateRequestProcessorChain>*
>
> *  <requestHandler name="/dataimport" class="solr.DataImportHandler">   *
> * <lst name="defaults">  *
> * <str name="config">data-config.xml</str>*
> * <str name="update.chain">stanbolInterceptor</str>*
> * </lst> *
> *   </requestHandler>*
>
> My sample data-config.xml is as below;
>
> *<dataConfig>*
> *<dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver"
> url="jdbc:mysql://localhost:3306/solrTest" user="test" password="test123"
> batchSize="1" />*
> *    <document name="stanboldata">*
> *        <entity name="stanbolrequest" query="SELECT * FROM documents">*
> *            <field column="id" name="id" />*
> *            <field column="content" name="content" />*
> *     <field column="title" name="title" />*
> *        </entity>*
> *    </document>*
> *</dataConfig>*
>
> When running a large import with about 1000 documents, my stanbol server
> goes down, I suspect due to heavy load from the above Solr
> Stanbolnterceptor.
> I would like to throttle the dataimport in batches, so that Stanbol can
> process a manageable number of requests concurrently.
> Is this achievable using batchSize parameter in dataSource element in the
> data-config?
> Can someone please give some ideas to throttle the dataimport load in Solr?
>
> Thanks,
> Dileepa
>



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message