lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "engy.ali" <omeshm...@hotmail.com>
Subject RE: Solr index - Size and indexing speed
Date Sat, 29 Aug 2009 11:09:43 GMT

Hi, 

Thanks for your reply.

I will work on your suggestion for using only one solr instance.

I tried to merge the 15 indexes again, and I found out that the new merged
index (without opitmization) size was about 351 GB , but when I optimize it
the size return back to 411 GB, Why?

I thought that optimization would decrease or at least be equal to the same
index size before optimization



Funtick wrote:
> 
> Hi,
> 
> Can you try to use single SOLR instance with heavy RAM (so that
> ramBufferSizeMB=8192 for instance) and mergeFactor=10? Single SOLR
> instance
> is fast enough (> 100 client threads of Tomcat; configurable) - I usually
> prefer single instance for single "writable" box with heavy RAM allocation
> and good I/O.
> 
> Merging 15 indexes and 4-times larger size could happen, for instance,
> because of differences in SOLR Schema and Lucene; ensure that schema is
> the
> same (using Luke for instance). SOLR 1.4 has some new powerful features
> such
> as document->term cache stored somewhere (uninverted index) (Yonik), term
> vectors, stored=true, copyField, etc. 
> 
> Do not do commit per 100; do it once at the end...
> 
> 
> 
> -----Original Message-----
> From: engy.ali [mailto:omeshmesh@hotmail.com] 
> Sent: August-25-09 3:31 PM
> To: solr-user@lucene.apache.org
> Subject: Solr index - Size and indexing speed
> 
> 
>  Summary
> ===============
> 
> I had about 120,000 object of total size 71.2 GB, those objects are
> already
> indexed using Lucene. The index size is about 111 GB.
> 
> I tried to use solr 1.4 nightly build to index the same collection. I
> divided collection on three servers, each server had 5 solr instances (not
> solr cores) up and running. 
> 
> After collection had been indexed, i merge the 15 indexes.
> 
> Problems
> ==============
> 
> 1. The new merged index size is about 411 GB (i.e: 4 times larger than old
> index using lucene)
> 
> I tried to index only on object using lucene and same object using solr to
> verify the size and the result was that the new index is about twice size
> of
> old index.
> 
> DO you have any idea what might be the reason?
> 
> 
> 2. the indexing speed is slow, 100 object on single solr instance were
> indexed in 1 hour so i estimated that 1000 on single instance can be done
> in
> 10 hours, but that was not the case, the indexing time exceeds estimated
> time by about 12 hour.
> 
> is that might be related to the growth of index?if not, so what might be
> the
> reason.
> 
> Note: I do a commit/100 object and an optimize by the end of the whole
> operation. I also changed the mergeFactor from 10 to 15.
> 
> 
> 3.  I google and found out that solr is using an inverted index, but I
> want
> to know what is the internal structure of solr index,for example if i have
> a
> word and its stems, how it will be store in the index 
> 
> Thanks, 
> Engy
> -- 
> View this message in context:
> http://www.nabble.com/Solr-index---Size-and-indexing-speed-tp25140702p251407
> 02.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Solr-index---Size-and-indexing-speed-tp25140702p25201981.html
Sent from the Solr - User mailing list archive at Nabble.com.


Mime
View raw message