cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CASSANDRA-674) New SSTable Format
Date Fri, 24 Jun 2011 03:37:47 GMT

     [ https://issues.apache.org/jira/browse/CASSANDRA-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Stu Hood updated CASSANDRA-674:
-------------------------------

    Attachment: 674-v3.tgz

I've finished rebasing 2319 onto 674 (to gain back the wide-row random access performance
we lost here by removing the row index). Nothing has changed in 674 (still v3), but 2319 was
simplified considerably by not having to deal with Lazily vs Pre CompactedRows.

I'm also attaching some performance numbers for a wide row timeseries usecase. The workload
was:
* 10000 rows
* 250 million columns (randomly across the 10000 rows)
* 99% writes (appends), 1% reads (from tail of row)
* LongType column names, CounterColumnType column values
* Custom YCSB workload using a monotonically increasing long for column name AND value: on
average, it will increase by N for every new column in a row for N rows
* (8G Linux cgroup) - (6G JVM heap) ~= (2G of Linux page cache)

Result summary:
|| build || disk volume (bytes) || bytes per column || runtime (s) || throughput (ops/s) ||
50th % read ms || 99th % read ms ||
| trunk | 16,716,432,189 | 66.8 | 8620356 | 29001 | 2.444 | 154 |
| trunk gz 6 * | 2,747,319,000 | 10.98 | | | | |
| 674+2319 | 2,375,027,696 | 9.5 | 7939503 | 31488 | 9.161 | 20 |
\* "trunk gz 6" is the size of compressing the data directory of the trunk result at GZIP
level 6

I would love to work with someone in the community to review this branch: I feel comfortable
that the remaining issues could be worked out after commit, but I'm willing to do anything
it takes before merge. Internally, we're working to deploy this branch within the next few
weeks.

> New SSTable Format
> ------------------
>
>                 Key: CASSANDRA-674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-674
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Stu Hood
>             Fix For: 1.0
>
>         Attachments: 674-v1.diff, 674-v2.tgz, 674-v3.tgz, perf-674-v1.txt, perf-trunk-2f3d2c0e4845faf62e33c191d152cb1b3fa62806.txt
>
>
> Various tickets exist due to limitations in the SSTable file format, including #16, #47
and #328. Attached is a proposed design/implementation of a new file format for SSTables that
addresses a few of these limitations.
> This v2 implementation is not ready for serious use: see comments for remaining issues.
It is roughly the format described here: http://wiki.apache.org/cassandra/FileFormatDesignDoc


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message