cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Benoit Perroud <>
Subject Re: SSTableSimpleUnsortedWriter take long time when inserting big rows
Date Fri, 02 Sep 2011 09:17:08 GMT
Thanks for your answer.

2011/9/2 Sylvain Lebresne <>:
> On Fri, Sep 2, 2011 at 10:29 AM, Benoit Perroud <> wrote:
>> Hi All,
>> I started using SSTableSimpleUnsortedWriter to load data, and my data
>> has a few rows but a lot of column name in each rows.
>> I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted.
>> But the time taken to insert columns is increasing as the column
>> family is increasing. The problem appears because everytime we call
>> newRow, all the columns of the previous CF is added to the new CF.
> If I understand correctly, each row has way more that 10 000 columns, but
> you call newRow every 10 000 columns, right ?

Yes. I call newRow every 10 000 columns to be sure to flush as soon as possible.

> Note that you have the possibility to decrease the frequency of the calls to
> newRow.
> But anyway, I agree that the code shouldn't suck like that.
>> Attached is a small patch that check which is the smallest CF, and add
>> the smallest CF to the biggest one.
>> Should I open I bug for that ?
> Please do. I'm actually thinking of a slightly different fix: we should not have
> to add all the previous columns to the new column family, we should just
> directly reuse the previous column family when adding the new column.
> But the JIRA ticket will be a better place to discuss this.

Opened :
Let's discuss there.

Thanks !


> --
> Sylvain

View raw message