cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-10358) Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter
Date Tue, 03 Nov 2015 11:48:27 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-10358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14987144#comment-14987144
] 

Sylvain Lebresne commented on CASSANDRA-10358:
----------------------------------------------

Now that I read this more carefully, I'm actually not all that sure I understand what you're
trying to do here.

You're trying to create sstables with no overlap in token ranges, so that does mean you're
using a _sorted_ {{CQLSStableWriter}} right? Otherwise, how could we ever ensure no token
overlap when we have no way to know in which order partition swill be passed to the writer
(we'd have to buffer everything ever in memory). And if you use a sorted writer, then you
shouldn't care about CASSANDRA-7360 since it only affects unsorted writers.

So I was actually too quick at calling your point #2 above a bug, it's not. I don't think
there is a practical way for an unsorted writer to generate sstables with non overlapping
token ranges (and CASSANDRA-7360 is only a very minor part of the problem).

The intent behind {{CQLSSTableWriter}} is that the sstable generated should be loaded through
{{sstableloader}}, which imply both that overlapping sstables are not a problem (even if you
use LCS, the sstables will start at level 0 which can have overlaps) and that you can generate
sstables in parallel without needing to tweak the filename: the sstables will be renamed so
they don't conflict once loaded into the node.

Overall, it seems what you're trying to achieve is not something {{CQLSSTableWriter}} was
designed for. We're happy to make that design evolve if that's sensible, but I think we'd
need more clarity into what your exact use case is (and why using {{sstableloader}} is not
good enough in particular).


> Allow CQLSSTableWriter.Builder to use custom AbstractSSTableSimpleWriter 
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10358
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10358
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Andre Turgeon
>            Priority: Minor
>         Attachments: SSTableWriterCreationStrategy.patch, patch.txt
>
>
> I've created a patch for your consideration. 
> This change to CQLSSTableWriter allows for a custom AbstractSSTableSimpleWriter to be
specified. 
> I needed this for a bulkload process I wrote. I believe the change would be beneficial
for other people as well. 
> Below are the reasons I needed a custom implementation of AbstractSSTableSimpleWriter:
> 1) The available implementations of AbstractSSTableSimpleWriter do not provide a way
to specify the filename (or rather revision) of the sstable. I needed to control the name
because my bulkload process write sstables in parallel (on multiple machines) and I wish to
avoid name collisions.
> 2) I discovered a problem with SSTableSimpleUnsortedWriter where it creates invalid level-compaction-style
sstables; It allows a partition to span 2 sstables which violates the "no overlap of token
ranges" constraint of level compaction.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message