lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Per Steffensen <st...@designware.dk>
Subject Re: Dynamic collections in SolrCloud for log indexing
Date Mon, 24 Dec 2012 09:30:15 GMT
I believe it is a misunderstandig to use custom routing (or sharding as 
Erick calls it) for this kind of stuff. Custom routing is nice if you 
want to control which slice/shard under a collection a specific document 
goes to - mainly to be able to control that two (or more) documents are 
indexed on the same slice/shard, but also just to be able to control on 
which slice/shard a specific document is indexed. Knowing/controlling 
this kind of stuff can be used for a lot of nice purposes. But you dont 
want to move slices/shards around among collection or delete/add slices 
from/to a collection - unless its for elasticity reasons.

I think you should fill a collection every week/month and just keep 
those collections as is. Instead of ending up with a big "historic" 
collection containing many slices/shards/cores (one for each historic 
week/month), you will end up with many historic collections (one for 
each historic week/month). Searching historic data you will have to 
cross-search those historic collections, but that is no problem at all. 
If Solr Cloud is made at it is supposed to be made (and I believe it is) 
it shouldnt require more resouces or be harder in any way to 
cross-search X slices across many collections, than it is to 
cross-search X slices under the same collection.

Besides that see my answer for topic "Will SolrCloud always slice by ID 
hash?" a few days back.

Regards, Per Steffensen

On 12/24/12 1:07 AM, Erick Erickson wrote:
> I think this is one of the primary use-cases for custom sharding. Solr 4.0
> doesn't really lend itself to this scenario, but I _believe_ that the patch
> for custom sharding has been committed...
>
> That said, I'm not quite sure how you drop off the old shard if you don't
> need to keep old data. I'd guess it's possible, but haven't implemented
> anything like that myself.
>
> FWIW,
> Erick
>
>
> On Fri, Dec 21, 2012 at 12:17 PM, Upayavira <uv@odoko.co.uk> wrote:
>
>> I'm working on a system for indexing logs. We're probably looking at
>> filling one core every month.
>>
>> We'll maintain a short term index containing the last 7 days - that one
>> is easy to handle.
>>
>> For the longer term stuff, we'd like to maintain a collection that will
>> query across all the historic data, but that means every month we need
>> to add another core to an existing collection, which as I understand it
>> in 4.0 is not possible.
>>
>> How do people handle this sort of situation where you have rolling new
>> content arriving? I'm sure I've heard people using SolrCloud for this
>> sort of thing.
>>
>> Given it is logs, distributed IDF has no real bearing.
>>
>> Upayavira
>>


Mime
View raw message