incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: possible feature request RP vs. OPP
Date Fri, 09 Sep 2011 16:08:23 GMT
On Fri, Sep 9, 2011 at 10:34 AM, Dean Hiller <dean@alvazan.com> wrote:

> I saw this quote in the pdf.....
>
> "For large indexes with common terms this too much data! Queries with >
> 100k hits"
>
> 1. What would be considered large?  In most of my experience, we have the
> typical size of a RDBMS index but just have many many many more indexes as
> the size of the index is just dependent on our largest partition based on
> how we partition the data.
>
> 2. Does solandra have a lucene api underlying implementation?  Our
> preference is to use lucene's api and the underlying implementation could be
> lucene, lucandra or solandra.
>
> 3. Why not just use a 8 bit or 16 bit key as the prefix instead of an sha
> and the rest of the key is unique as the user would have to choose a unique
> key to begin with?  After all, the hash only had to be bigger than the max
> number of nodes and 2^16 is quite large.
>
> thanks,
> Dean
>
>
> On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>
>>
>>
>> On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller <dean@alvazan.com> wrote:
>>
>>> I was wondering something.  Since I can take OPP and I can create a layer
>>> that for certain column families, I hash the key so that some column
>>> families are just like RP but on top of OPP and some of my other column
>>> families are then on OPP directly so I could use lucandra, why not make RP
>>> deprecated and instead allow users to create OPP by column family or RP
>>> where RP == doing the hash of the key on my behalf and prefixing my key with
>>> that hashcode and stripping it back off when I read it in again.
>>>
>>> ie. why have RP when you could do RP per column family with the above
>>> reasoning on top of OPP and have the best of both worlds?????
>>>
>>> ie. I think of having some column families random and then some column
>>> famiiles ordered so I could range query or use lucandra on top of those
>>> ones.
>>>
>>> thoughts?  I was just curious.
>>> thanks,
>>> Dean
>>>
>>>
>> You can use ByteOrderPartitioner and hash data yourself. However that
>> makes every row key will be 128bits larger as the key has to be:
>>
>> md5+originalkey
>>
>>
>> http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf
>>
>> Solandra now uses a 'modified' RandomPartitioner.
>>
>
>
I am not quite sure that using 8bit is good enough. It will shard your data
across a small number of nodes effectively, however I can imagine the
SStables will be "clumpy" because you reduce your sorting . It seems like a
http://en.wikipedia.org/wiki/Birthday_problem to me. (I could be wrong)

Mime
View raw message