lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Realtime search and facets with very frequent commits
Date Thu, 11 Feb 2010 21:19:52 GMT
Janne,

The answers to your last 2 questions are both yes.  I've seen that done a few times and it
works.  I don't have the answer to the always-hot cache question.


Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



----- Original Message ----
> From: Janne Majaranta <janne.majaranta@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thu, February 11, 2010 12:35:20 PM
> Subject: Realtime search and facets with very frequent commits
> 
> Hello,
> 
> I have a log search like application which requires indexed log events to be
> searchable within a minute
> and uses facets and the statscomponent.
> 
> Some stats:
> - The log events are indexed every 10 seconds with a "commitWithin" of 60
> seconds.
> - 1M events / day (~75% are updates to previous events).
> - Faceting over 14 fields ( strings ). Usually TOP5 by numdocs but facets
> for all 14 fields at the same time.
> - Heavy use of StatsComponent ( stats over facets of ~36M documents ).
> 
> 
> The application is running a single Solr instance. All updates and queries
> are sent to the same instance.
> Faceting and the StatsComponent are both amazingly fast with that amount of
> documents *when* the caches are warm.
> 
> The problem I'm now facing is that keeping the caches warm is too heavy
> compared to the frequency of updates.
> It takes over 60 seconds to warmup the caches to the level where facets and
> stats are returned in milliseconds.
> 
> I have tested putting a second solr instance on the same server and sending
> the updates to that new instance.
> Warming up the new small instance is very fast while the large instance has
> very hot caches.
> 
> I also put a third (empty) solr instance on the same server which passes the
> queries to the two instances with the
> "shards" parameters. This is mainly because the client app really doesn't
> have to know anything about the shards.
> 
> The setup was easy to configure and responses are back in milliseconds and
> the updates are visible in seconds.
> That is, responses in milliseconds over 40M documents and a update frequency
> of 15 seconds on a single physical server.
> The (lab) server has 16g RAM and it is running win23k.
> 
> Also, what I found out is that using the sharded setup I only need half the
> memory for the large instance.
> When indexing to the large instance the memory usage goes very fast up to
> the maximum allocated heap size and never goes down.
> 
> My question is, is there a magic switch in SOLR to have that kind of update
> frequency while having the caches on fire ?
> Or is it just impossible to achieve facet counts and queries in milliseconds
> while updating the index every minute ?
> 
> The second question is, the setup with a empty SOLR as a "coordinating"
> instance, a large SOLR instance with hot caches and a small SOLR instance
> with immediate updates,
> all on the same physical server, does it sound like a durable solution
> (until the small instance gets big) or is it something is braindead ?
> 
> And the third question is, would it be a good idea to merge the small and
> the large index periodically so that a fresh and empty small instance would
> be available
> after the merge ?
> 
> Any ideas ?
> 
> Best Regards,
> 
> Janne Majaranta


Mime
View raw message