lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: Out of memory
Date Fri, 16 Sep 2011 18:48:58 GMT

: Actually I am storing twitter streaming data into the core, so the rate of
: index is about 12tweets(docs)/second. The same solr contains 3 other cores
	...
: .         At any given time I dont need data more than past 15 days, unless
: someone queries for it explicetly. How can this be achieved?

so you are adding 12 docs a second, and you need to keep all docs forever, 
in case someone askes for a specific doc, but otherwise you only typically 
need to search for docs in the past 15 days.

if you index is going to grow w/o bounds at this rate forever then it 
doesn't matter what tricks you try, or how you tune things -- you are 
always going to run out of resources unless you adopt some sort of 
distributed approach.

off the cuff, i would suggest indexing all of the docs for a single "day" 
in one shard, and making most of your searches be a distributed request 
against the most recent 15 shards.

you didn't say how people "query for it explicitly" when looking for older 
docs -- if it's by date then when a user asks for a specific date range 
you cna just query those shards explicitly, if it's by some unique id then 
you'll want to cache in your application the min/max id for each doc in 
each shard (easy enough to determine by looping over them all and doing a 
stast query)


-Hoss

Mime
View raw message