incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Drew Kutcharian <d...@venarc.com>
Subject Re: Cassandra Compression and Wide Rows
Date Tue, 19 Mar 2013 02:58:48 GMT
Edward/Sylvain,

I also came across this post on DataStax's blog:

> When to use compression
> Compression is best suited for ColumnFamilies where there are many rows, with each row
having the same columns, or at least many columns in common. For example, a ColumnFamily containing
user data such as username, email, etc., would be a good candidate for compression. The more
similar the data across rows, the greater the compression ratio will be, and the larger the
gain in read performance.
> Compression is not as good a fit for ColumnFamilies where each row has a different set
of columns, or where there are just a few very wide rows. Dynamic column families such as
this will not yield good compression ratios.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression

@Sylvain, does this still apply on more recent versions of C*?


-- Drew



On Mar 18, 2013, at 7:16 PM, Edward Capriolo <edlinuxguru@gmail.com> wrote:

> I feel this has come up before. I believe the compression is block based, so just because
no two column names are the same does not mean the compression will not be effective. Possibly
in their case the compression was not effective.
> 
> On Mon, Mar 18, 2013 at 9:08 PM, Drew Kutcharian <drew@venarc.com> wrote:
> That's what I originally thought but the OOYALA presentation from C*2012 got me confused.
Do you guys know what's going on here?
> 
> The video: http://www.youtube.com/watch?v=r2nGBUuvVmc&feature=player_detailpage#t=790s
> The slides: Slide 22 @ http://www.datastax.com/wp-content/uploads/2012/08/C2012-Hastur-NoahGibbs.pdf
> 
> -- Drew
> 
> 
> On Mar 18, 2013, at 6:14 AM, Edward Capriolo <edlinuxguru@gmail.com> wrote:
> 
>> 
>> Imho it is probably more efficient for wide. When you decompress 8k blocks to get
at a 200 byte row you create overhead , particularly young gen.
>> On Monday, March 18, 2013, Sylvain Lebresne <sylvain@datastax.com> wrote:
>> > The way compression is implemented, it is oblivious to the CF being wide-row
or narrow-row. There is nothing intrinsically less efficient in the compression for wide-rows.
>> > --
>> > Sylvain
>> >
>> > On Fri, Mar 15, 2013 at 11:53 PM, Drew Kutcharian <drew@venarc.com> wrote:
>> >>
>> >> Hey Guys,
>> >>
>> >> I remember reading somewhere that C* compression is not very effective when
most of the CFs are in wide-row format and some folks turn the compression off and use disk
level compression as a workaround. Considering that wide rows with composites are "first class
citizens" in CQL3, is this still the case? Has there been any improvements on this?
>> >>
>> >> Thanks,
>> >>
>> >> Drew
>> >
> 
> 


Mime
View raw message