lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: Using DIH to import 10 million records
Date Mon, 05 Mar 2012 06:33:29 GMT
On Mon, Mar 5, 2012 at 5:56 AM, Lance Norskog <goksron@gmail.com> wrote:

> You can run the DIH with multiple threads feeding from the same query.
>
FWIW,
https://issues.apache.org/jira/browse/SOLR-3011


> Depends also on the size of the document: large documents may index
> faster if they have their own threads. This may then interact with the
> new NRT multi-commit code.
>
> On Sun, Mar 4, 2012 at 5:19 PM, Shawn Heisey <solr@elyograg.org> wrote:
> > On 3/4/2012 3:31 AM, Sphene Software wrote:
> >>
> >> Folks,
> >>
> >> I am planning to use DIH for an index of size 10 million records.
> >>
> >> I would like to know the following;
> >> - Can DIH scale for this size of an indexes
> >> - If DIH is a bottleneck, what is the specific issue and how it can be
> >> addressed
> >
> >
> > My entire index is about 67 million documents.  There are a total of
> seven
> > shards, six of them have over 11 million documents each.  I can do a full
> > dataimport (from MySQL) of those six shards simultaneously in less than
> > three hours.  The seventh shard is less than 500000 documents and builds
> > after the others during a full rebuild.  It is rare that we have to do a
> > full rebuild, it's mostly at schema change time.
> >
> > I use SolrJ for updates, my experience with that so far suggests that
> doing
> > the full import with my SolrJ code would take significantly longer than
> > three hours.
> >
> > Thanks,
> > Shawn
> >
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message