cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Brandon Williams (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable
Date Thu, 19 May 2011 20:46:48 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036447#comment-13036447
] 

Brandon Williams commented on CASSANDRA-1278:
---------------------------------------------

+1 on more visibility.  It'd be nice if it printed the filename and the time it took for each
time, since just having the percentages reset is a bit confusing.  Also, this should respect
SS.RING_DELAY instead of arbitrarily choosing an amount of time to wait for gossip.

I loaded 10M rows with stress.java defaults, then bulkloaded them from one machine to another
in 75s (accounting for gossip delay.)  This totaled 3.1G of data, so about 44MB/s.  Conversely,
it took about 15 minutes to load with stress.

> Make bulk loading into Cassandra less crappy, more pluggable
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-1278
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1278
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jeremy Hanna
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.1
>
>         Attachments: 0001-Add-bulk-loader-utility.patch, 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt,
1278-cassandra-0.7.txt
>
>   Original Estimate: 40h
>          Time Spent: 40h 40m
>  Remaining Estimate: 0h
>
> Currently bulk loading into Cassandra is a black art.  People are either directed to
just do it responsibly with thrift or a higher level client, or they have to explore the contrib/bmt
example - http://wiki.apache.org/cassandra/BinaryMemtable  That contrib module requires delving
into the code to find out how it works and then applying it to the given problem.  Using either
method, the user also needs to keep in mind that overloading the cluster is possible - which
will hopefully be addressed in CASSANDRA-685
> This improvement would be to create a contrib module or set of documents dealing with
bulk loading.  Perhaps it could include code in the Core to make it more pluggable for external
clients of different types.
> It is just that this is something that many that are new to Cassandra need to do - bulk
load their data into Cassandra.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message