cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "T Jake Luciani (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable
Date Fri, 28 Jan 2011 20:22:47 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988208#action_12988208
] 

T Jake Luciani edited comment on CASSANDRA-1278 at 1/28/11 3:22 PM:
--------------------------------------------------------------------

I understand why you did what you did here, but the concept of taking thrift encoded data
over streaming port then creating another set of thrift objects to create row mutations feels,
well, bulky :)

It seems like there would be a way to refine what you've done to go from client -> memtable
more quickly.

if you took the column and supercolumn serializers and streamed the delimited byte arrays
you would build up a CSLM<ByteBuffer,ByteBuffer> and call SSTableWriter.append once
it's "full"

You could then kick off secondary index rebuilding in the background.



      was (Author: tjake):
    I understand why you did what you did here, but the concept of taking thrift encoded data
over gossip then creating into thrift objects to create row mutations feels, well, bulky :)

It seems like there would be a way to refine what you've done to go from client -> memtable
more quickly.

if you took the column and supercolumn serializers and streamed the delimited byte arrays
you would build up a CSLM<ByteBuffer,ByteBuffer> and call SSTableWriter.append once
it's "full"

You could then kick off secondary index rebuilding in the background.


  
> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Matthew F. Dennis
>             Fix For: 0.7.2
>
>         Attachments: 1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40.67h
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either directed to
just do it responsibly with thrift or a higher level client, or they have to explore the contrib/bmt
example - http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires delving
into the code to find out how it works and then applying it to the given problem.  Using either
method, the user also needs to keep in mind that overloading the cluster is possible - which
will hopefully be addressed in CASSANDRA-685
> This improvement would be to create a contrib module or set of documents dealing with
bulk loading.  Perhaps it could include code in the Core to make it more pluggable for external
clients of different types.
> It is just that this is something that many that are new to Cassandra need to do - bulk
load their data into Cassandra.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message