lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gora Mohanty <g...@mimirtech.com>
Subject Re: how can i use solrj binary format for indexing?
Date Mon, 18 Oct 2010 13:21:13 GMT
On Mon, Oct 18, 2010 at 5:26 PM, Jason, Kim <hialooha@gmail.com> wrote:
>
> Hi, Gora
> I haven't tried yet indexing huge amount of xml files through curl or pure
> java(like a post.jar).
> Indexing through xml is really fast?
> How many files did you index? And How did it(using curl or pure java)?
[...]

We did it through curl. There were some 3.5 million XML files, and some
60 fields in the Solr schema, with minor tokenising, though with some
facets. A total of about 40GB of data. We used five Solr instances, and
five cores on each instance. From what I recall, it took 6h, though here
we might have well been limited by the read speed on a slow network
drive that held the data. If done in this way, one might need to merge the
data from the various cores, a task which took us about 1.5h.

Regards,
Gora

Mime
View raw message