Return-Path: X-Original-To: apmail-cassandra-commits-archive@www.apache.org Delivered-To: apmail-cassandra-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A3E70262C for ; Fri, 29 Apr 2011 21:57:45 +0000 (UTC) Received: (qmail 51254 invoked by uid 500); 29 Apr 2011 21:57:45 -0000 Delivered-To: apmail-cassandra-commits-archive@cassandra.apache.org Received: (qmail 51235 invoked by uid 500); 29 Apr 2011 21:57:45 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 51227 invoked by uid 99); 29 Apr 2011 21:57:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Apr 2011 21:57:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Apr 2011 21:57:42 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 80B3BB91A7 for ; Fri, 29 Apr 2011 21:57:03 +0000 (UTC) Date: Fri, 29 Apr 2011 21:57:03 +0000 (UTC) From: "Matthew F. Dennis (JIRA)" To: commits@cassandra.apache.org Message-ID: <1223494847.12745.1304114223524.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (CASSANDRA-1278) Make bulk loading into Cassandra less crappy, more pluggable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/CASSANDRA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthew F. Dennis updated CASSANDRA-1278: ----------------------------------------- Attachment: 1278-cassandra-0.7-v2.txt {quote}Did you accidentally remove the buffering for non-streaming connections in IncomingTCPConnection?{quote} It was intentional as previously only the streaming was buffered (at 4k) but the bulk of the work uses the socket channel; only the size and header are read from input and the header uses readFully. It adds an extra call when constructing the stream because of the size but avoids copying the data into the buffer (in the BufferedStream) and then into the byte array. We could lower some of those calls by reading both the magic and the header int at the same time into a ByteBuffer and then viewing it as an IntBuffer but I don't think that buys you anything as it only happens on a new connection. It also avoids bugs where something has been read from the socket into the buffer and then the socket channel is used later even though the buffer may not have been fully drained. {quote}What kind of failures are supported during a load?{quote} On the server side all failures result in the same behaviour: close socket, delete temp files. On the client side if flushing of a BareMemtable to the server fails the proxy will log it and continue running. In both cases any data that was being loaded via the proxy needs to be reloaded. {quote}What's the proper behaviour for flush_proxy when some hosts fail?{quote} log failed flushes and continue running; any data that was being loaded via the proxy needs to be reloaded. {quote}Could we avoid coding in knowledge of the file format in the inner loop of IncomingLoaderStreamReader? I would much, much prefer that non-file-format-specific framing be added, and it would have the added benefit of not requiring as many system calls (4 per row vs 2 per frame){quote} We could construct something that buffers up X amount of data and then frames the data being sent and change the inner loop to decompose that but it's extra complexity, code and overhead. If we buffer it on the other side we consume more memory for a longer period of time (thus giving it a better chance that it needs to promoted and/or compacted) adding to the already problematic GC pressure. If we don't buffer the rows we end up framing every row which is additional data and still doing 2 out of the 4 transfers we do now on data of the same size (since the frames wouldn't be any bigger). BTW, 2 or 4 xfes in this situation doesn't affect the performance; the latency on the network and CPU of compaction and indexing building dwarf any gains to be made here. The current approach has the added benefit that debugging is easy because it's clear where the key and row boundaries are. {quote}What is the benefit of using an independent protocol for "Loader" streams?{quote} If you're comparing to the streams we use for repair and similar, they require table names and byte ranges be known up front. While a proxy could just generate a random name, it doesn't know the sizes because it doesn't have a SSTable on disk (or buffered in memory). There is also no way for a node to request a retry from a proxy if the stream fails because the proxy won't have the data and in general is probably firewalled off C*-to-Proxy connections. And even if we did, we'd still have a bunch of small sessions because the proxy doesn't know when a client is going to stop sending data to it. In the most general sense it could be a constant thing; a client may just continually pump an RSS feed or stock ticks or something into it. tl;dr simplicity and code reduction. {quote}Again, awesome.{quote} thanks > Make bulk loading into Cassandra less crappy, more pluggable > ------------------------------------------------------------ > > Key: CASSANDRA-1278 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1278 > Project: Cassandra > Issue Type: Improvement > Components: Tools > Reporter: Jeremy Hanna > Assignee: Matthew F. Dennis > Fix For: 0.8.1 > > Attachments: 1278-cassandra-0.7-v2.txt, 1278-cassandra-0.7.1.txt, 1278-cassandra-0.7.txt > > Original Estimate: 40h > Time Spent: 40h 40m > Remaining Estimate: 0h > > Currently bulk loading into Cassandra is a black art. People are either directed to just do it responsibly with thrift or a higher level client, or they have to explore the contrib/bmt example - http://wiki.apache.org/cassandra/BinaryMemtable That contrib module requires delving into the code to find out how it works and then applying it to the given problem. Using either method, the user also needs to keep in mind that overloading the cluster is possible - which will hopefully be addressed in CASSANDRA-685 > This improvement would be to create a contrib module or set of documents dealing with bulk loading. Perhaps it could include code in the Core to make it more pluggable for external clients of different types. > It is just that this is something that many that are new to Cassandra need to do - bulk load their data into Cassandra. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira