lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <>
Subject Re: caceh implemetation?
Date Fri, 24 Jul 2015 19:58:34 GMT
On Fri, Jul 24, 2015 at 1:06 AM, Shawn Heisey <> wrote:

> On 7/23/2015 10:55 AM, cbuxbaum wrote:
> > Say we have 1000000 party records.  Then the child SQL will be run
> 1000000
> > times (once for each party record).  Isn't there a way to just run the
> child
> > SQL on all of the party records at once with a join, using a GROUP BY and
> > ORDER BY on the PARTY_ID?  Then the results from that query could easily
> be
> > placed in SOLR according to the primary key (party_id).  Is there some
> part
> > of the Data Import Handler that operates that way?
> Using well-crafted SQL JOIN is almost always going to be better for
> dataimport than nested entities.  The heavy lifting is done by the
> database server, using code that's extremely well-optimized for that
> kind of lifting.  Doing what you describe with a parent entity and one
> nested entity (that is not cached) will result in 1000001 total SQL
> queries.  A million SQL queries, no matter how fast each one is, will be
> slow.
> If you can do everything in a single SQL query with JOIN, then Solr will
> make exactly one SQL query to the server for a full-import.
> For my own dataimport, I use a view that was defined on the mysql server
> by the dbadmin.  The view does all the JOINs we require.
> Solr's dataimport handler doesn't have any intelligence to do the join
> locally.  It would be cool if it did, but somebody would have to write
> the code to teach it how.  Because the DB server itself can already do
> JOINs, and it can do them VERY well, there's really no reason to teach
> it to Solr.

fwiw, DIH now has join=”zipper”
<> attribute which can be
specified to child entity, it enables classic ETL external merge join

> Thanks,
> Shawn

Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message