lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gora Mohanty <>
Subject Re: how can i use solrj binary format for indexing?
Date Mon, 18 Oct 2010 13:21:13 GMT
On Mon, Oct 18, 2010 at 5:26 PM, Jason, Kim <> wrote:
> Hi, Gora
> I haven't tried yet indexing huge amount of xml files through curl or pure
> java(like a post.jar).
> Indexing through xml is really fast?
> How many files did you index? And How did it(using curl or pure java)?

We did it through curl. There were some 3.5 million XML files, and some
60 fields in the Solr schema, with minor tokenising, though with some
facets. A total of about 40GB of data. We used five Solr instances, and
five cores on each instance. From what I recall, it took 6h, though here
we might have well been limited by the read speed on a slow network
drive that held the data. If done in this way, one might need to merge the
data from the various cores, a task which took us about 1.5h.


View raw message