incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Viner <davevi...@pobox.com>
Subject Re: Cold boot performance problems
Date Sat, 09 Oct 2010 00:31:28 GMT
Has anyone found solid step-by-step docs on how to raid0 the ephemeral disks
in ec2 for use by Cassandra?

On Fri, Oct 8, 2010 at 12:11 PM, Jason Horman <jhorman@gmail.com> wrote:

> We are currently using EBS with 4 volumes striped with LVM. Wow, we
> didn't realize you could raid the ephemeral disks. I thought the
> opinion for Cassandra though was that the ephemeral disks were
> dangerous. We have lost of a few machines over the past year, but
> replicas hopefully prevent real trouble.
>
> How about the sharding strategies? Is it worth it to investigate
> sharding out via multiple keyspaces? Would order preserving
> partitioning help group indexes better for users?
>
> On Fri, Oct 8, 2010 at 1:53 PM, Jonathan Ellis <jbellis@gmail.com> wrote:
> > Two things that can help:
> >
> > In 0.6.5, enable the dynamic snitch with
> >
> > -Dcassandra.dynamic_snitch_enabled=true
> > -Dcassandra.dynamic_snitch=cassandra.dynamic_snitch_enabled
> >
> > which if you are doing a rolling restart will let other nodes route
> > around the slow node (at CL.ONE) until it's warmed up (by the read
> > repairs in the background).
> >
> > In 0.6.6, we've added save/load of the Cassandra caches:
> > https://issues.apache.org/jira/browse/CASSANDRA-1417
> >
> > Finally: we recommend using raid0 ephemeral disks on EC2 with L or XL
> > instance sizes for better i/o performance.  (Corey Hulen has some
> > numbers at http://www.coreyhulen.org/?p=326.)
> >
> > On Fri, Oct 8, 2010 at 12:36 PM, Jason Horman <jhorman@gmail.com> wrote:
> >> We are experiencing very slow performance on Amazon EC2 after a cold
> boot.
> >> 10-20 tps. After the cache is primed things are much better, but it
> would be
> >> nice if users who aren't in cache didn't experience such slow
> performance.
> >> Before dumping a bunch of config I just had some general questions.
> >>
> >> We are using uuid keys, 40m of them and the random partitioner. Typical
> >> access pattern is reading 200-300 keys in a single web request. Are uuid
> >> keys going to be painful b/c they are so random. Should we be using less
> >> random keys, maybe with a shard prefix (01-80), and make sure that our
> >> tokens group user data together on the cluster (via the order preserving
> >> partitioner)
> >> Would the order preserving partitioner be a better option in the sense
> that
> >> it would group a single users data to a single set of machines (if we
> added
> >> a prefix to the uuid)?
> >> Is there any benefit to doing sharding of our own via Keyspaces. 01-80
> >> keyspaces to split up the data files. (we already have 80 mysql shards
> we
> >> are migrating from, so doing this wouldn't be terrible implementation
> wise)
> >> Should a goal be to get the data/index files as small as possible. Is
> there
> >> a size at which they become problematic? (Amazon EC2/EBS fyi)
> >>
> >> Via more servers
> >> Via more cassandra instances on the same server
> >> Via manual sharding by keyspace
> >> Via manual sharding by columnfamily
> >>
> >> Thanks,
> >> --
> >> -jason horman
> >>
> >>
> >
> >
> >
> > --
> > Jonathan Ellis
> > Project Chair, Apache Cassandra
> > co-founder of Riptano, the source for professional Cassandra support
> > http://riptano.com
> >
>
>
>
> --
> -jason
>

Mime
View raw message