lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <erik.hatc...@gmail.com>
Subject Re: Importing large datasets
Date Wed, 02 Jun 2010 16:37:53 GMT
One thing that might help indexing speed - create a *single* SQL query  
to grab all the data you need without using DIH's sub-entities, at  
least the non-cached ones.

	Erik

On Jun 2, 2010, at 12:21 PM, Blargy wrote:

>
>
> As a data point, I routinely see clients index 5M items on normal  
> hardware
> in approx. 1 hour (give or take 30 minutes).
>
> Also wanted to add that our main entity (item) consists of 5 sub- 
> entities
> (ie, joins). 2 of those 5 are fairly small so I am using
> CachedSqlEntityProcessor for them but the other 3 (which includes
> item_description) are normal.
>
> All the entites minus the item_description connect to datasource1.  
> They
> currently point to one physical machine although we do have a pool  
> of 3 DB's
> that could be used if it helps. The other entity, item_description  
> uses a
> datasource2 which has a pool of 2 DB's that could potentially be  
> used. Not
> sure if that would help or not.
>
> I might as well that the item description will have indexed, stored  
> and term
> vectors set to true.
> -- 
> View this message in context: http://lucene.472066.n3.nabble.com/Importing-large-datasets-tp863447p865219.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message