lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: Getting error while excuting full import
Date Wed, 19 Apr 2017 12:01:06 GMT
On 4/18/2017 11:21 PM, ankur.168 wrote:
> I thought DIH does parallel db request for all the entities defined in a
> document.

I do not know anything about that.  It *could* be possible for all the
sub-entities just below another entity to run in parallel, but I've got
no idea whether this is the case.  At the top level, there is only one
thread handling documents one at a time, this I am sure of.

> I do believe that DIH is easier to use that's why I am trying to find a way to use this
in my current system. But as I explained above since I have so many sub entities,each returns
list of response which will be joined in to parent. for more than 2 lacs document, full import
is taking forever.
> What I am looking for is a way to speed up my full import using DIH only. To achieve
this I tried to split the document in 2 and do full import parallely. but with this approach
latest import overrides other document indexed data, since unique key(property_id) is same
for both documents.

The way to achieve top speed with DIH is to *not* define nested
entities.  Only define one entity with a single SELECT statement.  Let
the database handle all the JOIN work.  In my DIH config, I do "SELECT *
FROM X WHERE Y" ... X is a view defined on the database server that
handles all the JOINs, and Y is a fairly detailed conditional.

> One way I could think of is to keep document in different core which will maintain different
index files and merge the search results from both cores while performing search on indexed
data. But is this a good approach?

In order to do a sharded query, the uniqueKey field would need to be
unique across all cores.  My index is sharded manually, each shard does
a separate import when fully rebuilding the index.  The sharding
algorithm is coded into the SQL statement.


View raw message