lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naveen Gupta <nkgiit...@gmail.com>
Subject Re: indexing taking very long time
Date Fri, 05 Aug 2011 18:35:19 GMT
Hi ERick,

Version of SOLR 3.0

We are indexing the data using CURL call from C interface to SOLR server
using REST.

We are merging 15,000 docs in a single XML doc and directly using CURL to
index the data and then calling commit. (update)

For each of the client, we are creating a new connection .(a php script uses
exec() command to start new C process for every user) and hitting the SOLR
server.

We are using default solrconfig except few of the fields changes.inschema.xml

Max JVM heap allocation (512 MB RAM) (512 MB RAM is for linux box as well)

Initially i increased merge factor 50 and Ram size of 50 MB, but needed to
reduce since we were getting
java.lang.OutOfMemoryError: Java heap space

it is taking 3 mins to index 15,000 docs  ( a client can have 100 000 docs
and we have many multiple clients). Also we run in parallel search query
from other client to this index as well.

its the time between curl was called and the time response came back

When we commit, CPU usage goes upto 25 % (not all the cores, but yeah few of
them). The total number of cores is 4.

Can you please advise where to start from tuning perspective.

Some blog i was going through, it clearly says that it should take 40 secs
to index 100,000 docs (if you have 10-12 fields defined). I forgot the link.


They talked about increasing the merge factor.

Thanks
Naveen

On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson <erickerickson@gmail.com>wrote:

> What version of Solr are you using? If it's a recent version, then
> optimizing is not that  essential, you can do it during off hours, perhaps
> nightly or weekly.
>
> As far as indexing speed, have you profiled your application to see whether
> it's Solr or your indexing process that's the bottleneck? A quick check
> would be to monitor the CPU utilization on the server and see if it's high.
>
> As far as multithreading, one option is to simply have multiple clients
> indexing simultaneously. But you haven't indicated how the indexing is
> being
> done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to
> provide those kinds of details to get meaningful help.
>
> Best
> Erick
> On Aug 2, 2011 8:06 AM, "Naveen Gupta" <nkgiitkgp@gmail.com> wrote:
> > Hi
> >
> > We have a requirement where we are indexing all the messages of a a
> thread,
> > a thread may have attachment too . We are adding to the solr for indexing
> > and searching for applying few business rule.
> >
> > For a user, we have almost many threads (100k) in number and each thread
> may
> > be having 10-20 messages.
> >
> > Now what we are finding is that it is taking 30 mins to index the entire
> > threads.
> >
> > When we run optimize then it is taking faster time.
> >
> > The question here is that how frequently this optimize should be called
> and
> > when ?
> >
> > Please note that we are following commit strategy (that is every after
> 10k
> > threads, commit is called). we are not calling commit after every doc.
> >
> > Secondly how can we use multi threading from solr perspective in order to
> > improve jvm and other utilization ?
> >
> >
> > Thanks
> > Naveen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message