incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From alex kamil <alex.ka...@gmail.com>
Subject Re: Sparse vs dense index
Date Wed, 17 Mar 2010 14:22:38 GMT
yep, I'll probably try both

I don't think there is anything out there which can beat in-memory db in
terms of bulk throughput (e.g
http://cs.nyu.edu/cs/faculty/shasha/papers/sigmodpap.pdf) but will see how
far we can get with open source tools and using a combination of persistent
storage and caching/pre-fetching. When you have half a terabyte of RAM on a
20-node cluster there must be a way to utilize it

I'm also evaluating Hbase so hopefully will have some benchmark results to
show
If anyone on this list using Cassandra or HBase for time series indexing I'd
be happy to hear and share our findings

Thanks
Alex
http://www.linkedin.com/in/alexkamil


On Wed, Mar 17, 2010 at 9:52 AM, Jonathan Ellis <jbellis@gmail.com> wrote:

> I guess if you are going to read the full 5MB at once then that makes
> more sense.
>
> But if you are going to slice it or access parts by column name then
> the other does.
>
> On Tue, Mar 16, 2010 at 12:15 PM, alex kamil <alex.kamil@gmail.com> wrote:
> > which index structure would fit Cassandra more naturally and perform
> better:
> > 1) a sparse index where in each row there are 100 columns each containing
> a
> > 5MB data block (under a single column family)
> > or
> > 2) a dense index where  each row contains 100 columns with a single
> 6bytes
> > value (under a single column family)
> >
> > - assuming the total data size is 30-50TB, 500GB appends per day
> > -  the data is time series (output from a multichannel EEG sensor)
> > the key performance metric for us is read throughput (random reads/sec,
> > range queries, sequential scans)
> >
> > Thanks
> > Alex
> >
>

Mime
View raw message