cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <sylv...@datastax.com>
Subject Re: SSTableSimpleUnsortedWriter take long time when inserting big rows
Date Fri, 02 Sep 2011 09:01:23 GMT
On Fri, Sep 2, 2011 at 10:29 AM, Benoit Perroud <benoit@noisette.ch> wrote:
> Hi All,
>
> I started using SSTableSimpleUnsortedWriter to load data, and my data
> has a few rows but a lot of column name in each rows.
>
> I call SSTableSimpleUnsortedWriter.newRow every 10'000 columns inserted.
>
> But the time taken to insert columns is increasing as the column
> family is increasing. The problem appears because everytime we call
> newRow, all the columns of the previous CF is added to the new CF.

If I understand correctly, each row has way more that 10 000 columns, but
you call newRow every 10 000 columns, right ?

Note that you have the possibility to decrease the frequency of the calls to
newRow.

But anyway, I agree that the code shouldn't suck like that.

> Attached is a small patch that check which is the smallest CF, and add
> the smallest CF to the biggest one.
>
> Should I open I bug for that ?

Please do. I'm actually thinking of a slightly different fix: we should not have
to add all the previous columns to the new column family, we should just
directly reuse the previous column family when adding the new column.
But the JIRA ticket will be a better place to discuss this.

--
Sylvain

Mime
View raw message