lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter S <pete...@hotmail.com>
Subject RE: Dynamic Solr indexing
Date Tue, 02 Mar 2010 00:18:24 GMT

Hi Jan,

 

Thanks very much for your message. SolrCloud sounds very cool indeed...

 

So, from the Wiki, am I right in understanding that the only 'external' component is ZooKeeper,
everything else is pure Solr (i.e. replication, distrib queries et al. are all Solr http a.o.t.
something like Hadoop ipc)? If so, this makes it a nice tight package, keeping external dependencies
to minimum. Is SolrCloud 'ready for primetime' production at present?

 

Apologies for all the questions - Is SolrCloud marked for inclusion in 1.5?

 

Many thanks!

Peter

 


 
> Subject: Re: Dynamic Solr indexing
> From: jan.asf@cominvent.com
> Date: Tue, 2 Mar 2010 00:48:50 +0100
> To: solr-user@lucene.apache.org
> 
> Hi,
> 
> In current version you need to handle the cluster layout yourself, both on indexing and
search side, i.e. route documents to shards as you please, and know what shards to search.
> 
> We try to address how to make this easier in http://wiki.apache.org/solr/SolrCloud -
have a look at it. The idea is that there is a component that knows about the layout of the
search cluster, and we can then use this to know what shards to index to and search. If we
build a component which automatically routes documents to shards, your use case could be implemented
as one particular routing strategy, i.e. move to next shard when the current is "full" - ideal
for news type of indexes.
> 
> --
> Jan H√łydahl - search architect
> Cominvent AS - www.cominvent.com
> 
> On 1. mars 2010, at 18.58, Peter S wrote:
> 
> > 
> > Hi,
> > 
> > 
> > 
> > I wonder if anyone could shed some insight on a dynamic indexing question...?
> > 
> > 
> > 
> > The basic requirement is this:
> > 
> > 
> > 
> > Indexing:
> > 
> > A process writes to an index, and when it reaches a certain size (say, 1GB), a new
index (core) is 'automatically' created/deployed (i.e. the process doesn't know about it)
and further indexing now goes into the new core. When that one reaches its threshold size,
a new index is deplyoed, and so on.
> > 
> > The process that is writing to the indices doesn't actually know that it is writing
to different cores.
> > 
> > 
> > 
> > Searching:
> > 
> > When a search is directed at the above index, the actual search is a distrbitued
shard search across all the shards that have been deployed. Again, the searcher process doesn't
know this, but gets back the aggregated results, as if it had specified all the shards in
the request URL, but as these are changing dynamically, it of course can't know what they
all are at any given time.
> > 
> > 
> > 
> > This requirement sounds to me perhaps like a Katta thing. I've had a look at Solr-1395,
and there's questions in Lucid that sound similar (e.g. http://www.lucidimagination.com/search/document/4b3d00055413536d/solr_katta_integration#4b3d00055413536d),
so I guess (hope) I'm not the only one with this requirement.
> > 
> > 
> > 
> > I couldn't find anything in either Katta or SOLR-1395 that fit both the writing
and searching requirement, but I could easily have missed it.
> > 
> > 
> > 
> > Is Katta/Solr-1395 the way to go to achieve this? Would such a solution be 'production-ready'?
Has anyone deployed this type of thing in a production environment?
> > 
> > 
> > 
> > Any insight/advice would be greatly appreciated.
> > 
> > 
> > 
> > Thanks!
> > 
> > Peter
> > 
> > 
> > 
> > 
> > 
> > _________________________________________________________________
> > Do you have a story that started on Hotmail? Tell us now
> > http://clk.atdmt.com/UKM/go/195013117/direct/01/
> 
 		 	   		  
_________________________________________________________________
Tell us your greatest, weirdest and funniest Hotmail stories
http://clk.atdmt.com/UKM/go/195013117/direct/01/
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message