incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dean Hiller <d...@alvazan.com>
Subject Re: possible feature request RP vs. OPP
Date Fri, 09 Sep 2011 19:11:44 GMT
wouldn't that be ignoring the fact that is just a "prefix" and there is
still the unique key after that prefix ;), so yes it may be just as clumpy
as using OPP but only within a node which I don't really see as a big deal
at that point, or am I missing something?  Though maybe the default impl
would be 3 bytes so everyone would be happy.  main point being that I think
cassandra could use OPP underlying like hbase and then expose a RP or OPP
selection at column family creation time....that would be nice so I didn't
have to write the code myself(and so no one else has to write it
themselves).

Any info on #1 and #2???

thanks,
Dean

On Fri, Sep 9, 2011 at 10:08 AM, Edward Capriolo <edlinuxguru@gmail.com>wrote:

>
>
> On Fri, Sep 9, 2011 at 10:34 AM, Dean Hiller <dean@alvazan.com> wrote:
>
>> I saw this quote in the pdf.....
>>
>> "For large indexes with common terms this too much data! Queries with >
>> 100k hits"
>>
>> 1. What would be considered large?  In most of my experience, we have the
>> typical size of a RDBMS index but just have many many many more indexes as
>> the size of the index is just dependent on our largest partition based on
>> how we partition the data.
>>
>> 2. Does solandra have a lucene api underlying implementation?  Our
>> preference is to use lucene's api and the underlying implementation could be
>> lucene, lucandra or solandra.
>>
>> 3. Why not just use a 8 bit or 16 bit key as the prefix instead of an sha
>> and the rest of the key is unique as the user would have to choose a unique
>> key to begin with?  After all, the hash only had to be bigger than the max
>> number of nodes and 2^16 is quite large.
>>
>> thanks,
>> Dean
>>
>>
>> On Thu, Sep 8, 2011 at 4:10 PM, Edward Capriolo <edlinuxguru@gmail.com>wrote:
>>
>>>
>>>
>>> On Thu, Sep 8, 2011 at 5:12 PM, Dean Hiller <dean@alvazan.com> wrote:
>>>
>>>> I was wondering something.  Since I can take OPP and I can create a
>>>> layer that for certain column families, I hash the key so that some column
>>>> families are just like RP but on top of OPP and some of my other column
>>>> families are then on OPP directly so I could use lucandra, why not make RP
>>>> deprecated and instead allow users to create OPP by column family or RP
>>>> where RP == doing the hash of the key on my behalf and prefixing my key with
>>>> that hashcode and stripping it back off when I read it in again.
>>>>
>>>> ie. why have RP when you could do RP per column family with the above
>>>> reasoning on top of OPP and have the best of both worlds?????
>>>>
>>>> ie. I think of having some column families random and then some column
>>>> famiiles ordered so I could range query or use lucandra on top of those
>>>> ones.
>>>>
>>>> thoughts?  I was just curious.
>>>> thanks,
>>>> Dean
>>>>
>>>>
>>> You can use ByteOrderPartitioner and hash data yourself. However that
>>> makes every row key will be 128bits larger as the key has to be:
>>>
>>> md5+originalkey
>>>
>>>
>>> http://www.datastax.com/wp-content/uploads/2011/07/Scaling_Solr_with_Cassandra-CassandraSF2011.pdf
>>>
>>> Solandra now uses a 'modified' RandomPartitioner.
>>>
>>
>>
> I am not quite sure that using 8bit is good enough. It will shard your data
> across a small number of nodes effectively, however I can imagine the
> SStables will be "clumpy" because you reduce your sorting . It seems like a
> http://en.wikipedia.org/wiki/Birthday_problem to me. (I could be wrong)
>

Mime
View raw message