incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhu Han <>
Subject Re: RFC: Cassandra Virtual Nodes
Date Fri, 23 Mar 2012 02:04:17 GMT
On Fri, Mar 23, 2012 at 6:54 AM, Peter Schuller <
> wrote:

> > You would have to iterate through all sstables on the system to repair
> one
> > vnode, yes: but building the tree for just one range of the data means
> that
> > huge portions of the sstables files can be skipped. It should scale down
> > linearly as the number of vnodes increases (ie, with 100 vnodes, it will
> > take 1/100th the time to repair one vnode).

The SSTable indices should still be scanned for size tiered compaction.
Do I miss anything here?

> The story is less good for "nodetool cleanup" however, which still has
> to truck over the entire dataset.
> (The partitions/buckets in my crush-inspired scheme addresses this by
> allowing that each ring segment, in vnode terminology, be stored
> separately in the file system.)

But the number of files can be a big problem if there are hundreds of
vnodes and millions of sstables
on the same physical node.

We need a way to pin sstable inode to memory.  Otherwise,
it's possible the average number of disk IO to access a row in a sstable
be five or more.

> --
> / Peter Schuller (@scode,

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message