cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pavel Yaskevich (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-3623) use MMapedBuffer in CompressedSegmentedFile.getSegment
Date Sun, 25 Dec 2011 15:44:30 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175841#comment-13175841
] 

Pavel Yaskevich commented on CASSANDRA-3623:
--------------------------------------------

bq. Mean while your claim here is that snappy library is taking more CPU because we give it
DirectBB?

First of all I don't claim that it takes more CPU, I claim that it takes longer time to decompress
data comparing to normal reads. Second, I don't think it's a problem with direct BB itself
(btw, there is no way you can pass not direct buffer) but instead with mmap'ed I/O in that
case.

bq. Can you plz conform you tried v2 and gives a worse performance than trunk and it is Linux
(v1 doesn't give a better performance gains where as v2 does)?

Yes I tried v2 and it wasn't easy because first of all it wasn't rebased, then I figured out
that I needed to apply CASSANDRA-3611 and change call to FBUtilities.newCRC32() to "new CRC32()"
for it to compile, after that I added "disk_access_mode: mmap" to the conf/cassandra.yaml
and I used stress "./bin/stress -n 300000 -S 512 -I SnappyCompressor" to insert test data
(which don't fit into page cache) and tried to read with "./bin/stress -n 300000 -I SnappyCompressor
-o read" but got the following exceptions:

{code}
java.lang.RuntimeException: java.lang.UnsupportedOperationException
	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1283)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.UnsupportedOperationException
	at org.apache.cassandra.io.compress.CompressedMappedFileDataInput.mark(CompressedMappedFileDataInput.java:212)
	at org.apache.cassandra.db.columniterator.SimpleSliceReader.<init>(SimpleSliceReader.java:62)
	at org.apache.cassandra.db.columniterator.SSTableSliceIterator.createReader(SSTableSliceIterator.java:90)
	at org.apache.cassandra.db.columniterator.SSTableSliceIterator.<init>(SSTableSliceIterator.java:66)
	at org.apache.cassandra.db.filter.SliceQueryFilter.getSSTableColumnIterator(SliceQueryFilter.java:66)
	at org.apache.cassandra.db.filter.QueryFilter.getSSTableColumnIterator(QueryFilter.java:78)
	at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:232)
	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1283)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1136)
	at org.apache.cassandra.db.Table.getRow(Table.java:375)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
	at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:800)
	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1279)
	... 3 more
{code}

and 

{code}
ava.lang.RuntimeException: java.lang.UnsupportedOperationException
	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1283)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.UnsupportedOperationException
	at org.apache.cassandra.io.compress.CompressedMappedFileDataInput.reset(CompressedMappedFileDataInput.java:207)
	at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:78)
	at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:40)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
	at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:107)
	at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:145)
	at org.apache.cassandra.utils.MergeIterator$ManyToOne.<init>(MergeIterator.java:88)
	at org.apache.cassandra.utils.MergeIterator.get(MergeIterator.java:47)
	at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:137)
	at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:246)
	at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
	at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1283)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1169)
	at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1136)
	at org.apache.cassandra.db.Table.getRow(Table.java:375)
	at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:69)
	at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:800)
	at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:1279)
	... 3 more
{code}

After I managed to implement mark()/reset() methods I got the following results: current trunk
67 sec and your patch 101 sec to run read on 300000 rows. I have tested everything on the
server without any interference network and it seems that my results are clearer from side
effects than yours. I'm still not convinced that mmap'ed I/O is better for compressed data
than syscalls and I know that it has side effects that we can't control from java (mentioned
above) so I'm waiting for convincing results or we should close this ticket...
                
> use MMapedBuffer in CompressedSegmentedFile.getSegment
> ------------------------------------------------------
>
>                 Key: CASSANDRA-3623
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3623
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1
>            Reporter: Vijay
>            Assignee: Vijay
>              Labels: compression
>             Fix For: 1.1
>
>         Attachments: 0001-MMaped-Compression-segmented-file-v2.patch, 0001-MMaped-Compression-segmented-file.patch,
0002-tests-for-MMaped-Compression-segmented-file-v2.patch
>
>
> CompressedSegmentedFile.getSegment seem to open a new file and doesnt seem to use the
MMap and hence a higher CPU on the nodes and higher latencies on reads. 
> This ticket is to implement the TODO mentioned in CompressedRandomAccessReader
> // TODO refactor this to separate concept of "buffer to avoid lots of read() syscalls"
and "compression buffer"
> but i think a separate class for the Buffer will be better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message