The only thing I can think of is that the upgradesstables option follows a
slightly different path to the bulk uploader when it comes to generating the
sstables that have been flushed to disk?
Seems unlikely, they both run through the same classes which determine their compression strategy via configuration. 

However, prior to running the "upgradesstables" command, the total size of
all the SSTables was 27GB and afterwards its 12GB.
Do you have some of the log messages written when upgrade tables ran ? They will be from compaction and come in pairs you can correlate to same thread, one about what file is being compacted and another about how big the new file is.   

Do you have secondary indexes defined on the target CF ? 

If you can reproduce (or at least explain it pretty well) it's probably time to hit    https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton

On 3/07/2012, at 12:32 AM, jmodha wrote:

Just to clarify, the data that we're loading SSTables from (v1.0.3) doesn't
have compression enabled on any of the CF's.

So in theory the compression should occur on the receiving end (v1.1.1) as
we're going from uncompressed data to compressed data.

So I'm not sure if the bug you mention is causing the behaviour we're seeing
here.

The only thing I can think of is that the upgradesstables option follows a
slightly different path to the bulk uploader when it comes to generating the
sstables that have been flushed to disk?

--
View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/BulkLoading-SSTables-and-compression-tp7580849p7580938.html
Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.