incubator-cassandra-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Williams <dri...@gmail.com>
Subject Re: Implementing a input format that splits according to column size
Date Mon, 12 Sep 2011 13:31:56 GMT
On Mon, Sep 12, 2011 at 12:35 AM, Tharindu Mathew <mccloud35@gmail.com> wrote:
> Hi,
>
> I plan to do $subject and contribute.
>
> Right now, the hadoop integration splits according to the number of rows in
> a slice predicate. This doesn't scale if a row has a large number of
> columns.
>
> I'd like to know from the cassandra-devs as to how feasible this is?

It's feasible, but not entirely easy.  Essentially you need to page
through the row since you can't know how large it is beforehand.  IIRC
though, this breaks the current input format contract, since an entire
row is expected to be returned.

-Brandon

Mime
View raw message