cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mck (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-6091) Better Vnode support in hadoop/pig
Date Wed, 11 Feb 2015 22:18:13 GMT


mck commented on CASSANDRA-6091:

The approach in the patch is to do allow multiple token ranges per split.
We do with our custom input formats, and it is (very) effective in that it means splitSize
is honoured.

Handling multiple token ranges per split requires for example the code change found in CqlRecordReader
whereby the reader must iterate over both rows and tokenRanges.

The grouping of token rages by common location sets, so that splits again honour the splitSize,
happens in AbstractColumnFamilyInputForma.collectSplits(..)

Token ranges do not need to be adjacent.
Everything in this patch is done client-side.

> Better Vnode support in hadoop/pig
> ----------------------------------
>                 Key: CASSANDRA-6091
>                 URL:
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Alex Liu
>            Assignee: Alex Liu
> CASSANDRA-6084 shows there are some issues during running hadoop/pig job if vnodes are
enable. Also the hadoop performance of vnode enabled nodes  are bad for there are so many
> The idea is to combine vnode splits into a big sudo splits so it work like vnode is disable
for hadoop/pig job

This message was sent by Atlassian JIRA

View raw message