cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Burton <bur...@spinn3r.com>
Subject Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…
Date Fri, 30 May 2014 05:06:33 GMT
The general idea is that for HTML content, you want content from the same
domain to be adjacent on disk.  This way duplicate HTML template runs get
compressed REALLY well.

I think in our situations we would see exceptional compression.

If we get closer to this I'll just implement snappy+bmdiff...


On Thu, May 29, 2014 at 12:34 PM, Robert Coli <rcoli@eventbrite.com> wrote:

> On Sat, May 17, 2014 at 10:25 PM, Kevin Burton <burton@spinn3r.com> wrote:
>
>> "compression" … sure.. but bmdiff? Not that I can find.  BMDiff is an
>> algorithm that in some situations could result in 100000x compression due
>> to the way it's able to find long commons runs.  This is a pathological
>> case though.  But if you were to copy the US constitution into itself
>> … 100000x… bmdiff could ideally get a 100000x compression rate.
>>
>> not all compression algorithms are identical.
>>
>
> The compression classes are pluggable. Exploratory patches are always
> welcome! :D
>
> Not sure I understand why you consider Byte Ordered Partitioner relevant,
> isn't what matters for compressibility generally the uniformity of data
> within rows in the SSTable, not the uniformity of their row keys?
>
> =Rob
>



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Mime
View raw message