cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: ByteOrdered partitioner when using sha-1 as partition key
Date Sat, 11 Feb 2017 20:20:33 GMT
On Sat, Feb 11, 2017 at 1:47 PM, Micha <micha-1@fantasymail.de> wrote:

> I think I was not clear enough...
>
> I have *one* table for which the row data contains (among other values)
> a sha-1 sum. There are no collisions.  I thought computing a murmur hash
> for a sha-1 sum is just wasted time, as the murmur hash doesn't make the
> data more random than it already is.   So it's just one table where this
> matters.
>
>
>  Michael
>
>
> Am 11.02.2017 um 16:54 schrieb Jonathan Haddad:
> > The odds of only using a sha1 as your partition key for every table you
> > ever create is low. You will regret BOP until the end of time.
> > On Sat, Feb 11, 2017 at 5:53 AM Edward Capriolo <edlinuxguru@gmail.com
> > <mailto:edlinuxguru@gmail.com>> wrote:
> >
> >     Probably best to avoid bop even if you are aflready hashing keys
> >     yourself. What do you do when checksuma collide? It is possible
> right?
> >
> >     On Saturday, February 11, 2017, Micha <micha-1@fantasymail.de
> >     <mailto:micha-1@fantasymail.de>> wrote:
> >
> >         Hi,
> >
> >         my table has a sha-1 sum as partition key. Would in this case the
> >         ByteOrdered partitioner be a better choice than the
> >         Murmur3partitioner,
> >         since the keys are quite random?
> >
> >
> >         cheers,
> >          Michael
> >
> >
> >
> >     --
> >     Sorry this was sent from mobile. Will do less grammar and spell
> >     check than usual.
> >
>

The problem of using BOP is the partitioner is not set on the
table/keyspace level but it is set cluster wide. So if you have two tables
with different key distribution there is no way to balanced them out.

BOP I would almost consider it quasi supported at this point:

http://stackoverflow.com/questions/27939234/cassandra-byteorderedpartitioner

"no seriously your doing it wrong"

I have thought about this often, if you really need BOP, for example your
generating a web index and you want to co-locate data for the same domain
so you can scan it, Cassandra is a bad fit. I'm not convinced that a
secondary index/mv fills the need. Hbase seems a more logical choice (to
me). Where the data is logically ordered by key and the protocol splits
regions as they grow.

Mime
View raw message