incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Hiller <d...@alvazan.com>
Subject Re: possible feature request RP vs. OPP
Date Fri, 09 Sep 2011 14:34:00 GMT
I saw this quote in the pdf.....

"For large indexes with common terms this too much data! Queries with > 100k
hits"

1. What would be considered large?  In most of my experience, we have the
typical size of a RDBMS index but just have many many many more indexes as
the size of the index is just dependent on our largest partition based on
how we partition the data.

2. Does solandra have a lucene api underlying implementation?  Our
preference is to use lucene's api and the underlying implementation could be
lucene, lucandra or solandra.

3. Why not just use a 8 bit or 16 bit key as the prefix instead of an sha
and the rest of the key is unique as the user would have to choose a unique
key to begin with?  After all, the hash only had to be bigger than the max
number of nodes and 2^16 is quite large.

thanks,
Dean


On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

>
>
> On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller <dean@alvazan.com> wrote:
>
>> I was wondering something.  Since I can take OPP and I can create a layer
>> that for certain column families, I hash the key so that some column
>> families are just like RP but on top of OPP and some of my other column
>> families are then on OPP directly so I could use lucandra, why not make RP
>> deprecated and instead allow users to create OPP by column family or RP
>> where RP == doing the hash of the key on my behalf and prefixing my key with
>> that hashcode and stripping it back off when I read it in again.
>>
>> ie. why have RP when you could do RP per column family with the above
>> reasoning on top of OPP and have the best of both worlds?????
>>
>> ie. I think of having some column families random and then some column
>> famiiles ordered so I could range query or use lucandra on top of those
>> ones.
>>
>> thoughts?  I was just curious.
>> thanks,
>> Dean
>>
>>
> You can use ByteOrderPartitioner and hash data yourself. However that makes
> every row key will be 128bits larger as the key has to be:
>
> md5+originalkey
>
>
> http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf
>
> Solandra now uses a 'modified' RandomPartitioner.
>

Mime
View raw message