incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Williams <dri...@gmail.com>
Subject Re: Implementing a input format that splits according to column size
Date Mon, 12 Sep 2011 22:44:03 GMT
On Mon, Sep 12, 2011 at 1:54 PM, Tharindu Mathew <mccloud35@gmail.com> wrote:
> Thanks Brandon for the clarification.
>
> I'd like to support a use case where an index is built in a row in a CF.

If you're just _building_ the row, the current state of things will
work just fine.  The trouble starts when you need to read it via
hadoop.

> So, as a starting point for a query, a known row with a larger number of
> columns will have to be selected. The split to the hadoop nodes should start
> at that level.

The other problem here is if you want 10 nodes to operate on the row
and have RF=3, you're losing locality for 7 of the nodes.  If the task
is heavily CPU-bound this is probably ok, otherwise it may be that
only using 3 nodes is better (since they will have a local replica.)

> Is this a common use case?

I'm not entirely sure what it is you want to do yet, but maybe I
answered it above.

-Brandon

Mime
View raw message