lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason, Kim" <>
Subject Re: how can i use solrj binary format for indexing?
Date Fri, 22 Oct 2010 05:36:14 GMT

Hi Gora, I really appreciate.
Your reply was a great help to me. :)
I hope everything is fine with you.


Gora Mohanty-3 wrote:
> On Mon, Oct 18, 2010 at 8:22 PM, Jason, Kim <> wrote:
> Sorry for the delay in replying. Was caught up in various things this
> week.
>> Thank you for reply, Gora
>> But I still have several questions.
>> Did you use separate index?
>> If so, you indexed 0.7 million Xml files per instance
>> and merged it. Is it Right?
> Yes, that is correct. We sharded the data by user ID, so that each of the
> 25
> cores held approximately 0.7 million out of the 3.5 million records. We
> could
> have used the sharded indices directly for search, but at least for now
> have
> decided to go with a single, merged index.
>> Please let me know how to work multiple instances and cores in your case.
> [...]
> * Multi-core Solr setup is quite easy, via configuration in solr.xml:
> . The configuration, i.e.,
>   schema, solrconfig.xml, etc. need to be replicated across the
>   cores.
> * Decide which XML files you will post to which core, and do the
>   POST with curl, as usual. You might need to write a little script
>   to do this.
> * After indexing on the cores is done, make sure to do a commit
>   on each.
> * Merge the sharded indexes (if desired) as described here:
> . One thing to
>   watch out for here is disk space. When merging with Lucene
>   IndexMergeTool, we found that a rough rule of thumb was that
>   intermediate steps in the merge would require about twice as
>   much space as the total size of the indexes to be merged. I.e.,
>   if one is merging 40GB of data in sharded indexes, one should
>   have at least 120GB free.
> Regards,
> Gora

View this message in context:
Sent from the Solr - User mailing list archive at

View raw message