Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9AA1BD7D4 for ; Thu, 18 Oct 2012 21:31:46 +0000 (UTC) Received: (qmail 29725 invoked by uid 500); 18 Oct 2012 21:31:44 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 29690 invoked by uid 500); 18 Oct 2012 21:31:44 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 29682 invoked by uid 99); 18 Oct 2012 21:31:44 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 21:31:44 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jeremy.hanna1234@gmail.com designates 209.85.214.172 as permitted sender) Received: from [209.85.214.172] (HELO mail-ob0-f172.google.com) (209.85.214.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 21:31:36 +0000 Received: by mail-ob0-f172.google.com with SMTP id v19so10861179obq.31 for ; Thu, 18 Oct 2012 14:31:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=content-type:mime-version:subject:from:in-reply-to:date :content-transfer-encoding:message-id:references:to:x-mailer; bh=T/0QBnuAYr2t1fhSSM5k40+kNa7QNrgPG14rWVTaqp0=; b=g464tn2EscgOqC5Gc7Y/wy/lijaRBo8jZ/6Kb0Y6vxlCnakOA3lUVe3vSdgV5BGQyw y6mF9ZatH/rmGF/tnNPyd1wjK5Rfacm+5I5FxjHUFrmhY4c0o4a7N3qEsyL1a0SE1Mhj AWc5aunM8SCCW4O9ozEfhHh2lQ3WGreRqqkhw69Il5GWQoGiXVy7q9ucuQNAa74gQ4R8 KO7Vu5Zu8TKjOKHdcZYP5xUQLGyIa/ofzZjB2XjWk1QYk/OQiCk+xd/A6ZAHsInsbKHs S/y15/6H9ZXLUF4ikj6yTJFhfi6usY2rLkWJiGV5HwpCEXUzDQcyAI+oV80HmaeKnXTv O+KQ== Received: by 10.60.3.69 with SMTP id a5mr1917550oea.117.1350595875597; Thu, 18 Oct 2012 14:31:15 -0700 (PDT) Received: from [10.0.1.10] ([70.114.230.192]) by mx.google.com with ESMTPS id c9sm17232642oeh.9.2012.10.18.14.31.13 (version=TLSv1/SSLv3 cipher=OTHER); Thu, 18 Oct 2012 14:31:14 -0700 (PDT) Content-Type: text/plain; charset=iso-8859-1 Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) Subject: Re: hadoop consistency level From: Jeremy Hanna In-Reply-To: Date: Thu, 18 Oct 2012 16:31:15 -0500 Content-Transfer-Encoding: quoted-printable Message-Id: <326A0EF1-1584-4355-92F4-D3C94CAE5AEA@gmail.com> References: To: user@cassandra.apache.org X-Mailer: Apple Mail (2.1499) X-Virus-Checked: Checked by ClamAV on apache.org On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh wrote: > On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman > wrote: >> Not sure I understand your question (if there is one..) >>=20 >> You are more than welcome to do CL ONE and assuming you have hadoop = nodes >> in the right places on your ring things could work out very nicely. = If you >> need to guarantee that you have all the data in your job then you'll = need >> to use QUORUM. >>=20 >> If you don't specify a CL in your job config it will default to ONE = (at >> least that's what my read of the ConfigHelper source for 1.1.6 shows) >>=20 > I have two questions. > 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is > it correct? Yes and at QUORUM it's quasi local. The job tracker finds out where a = range is and sends a task to a replica with the data (local). In the = case of CL.QUORUM (see the Read Path section of = http://wiki.apache.org/cassandra/ArchitectureInternals), it will do an = actual read of the data on the node closest (local). Then it will get a = digest from other nodes to verify that they have the same data. So in = the case of RF=3D3 and QUORUM, it will read the data on the local node = where the task is running and will check the next closest replica for a = digest to verify that it is consistent. Information is sent across the = wire and there is the latency of that, but it's not the data that's = sent. > 2. With CL QUORUM cassandra reads data from all replicas. In this case > Hadoop doesn't give me any benefits. Application running outside the > cluster has the same performance. Is it correct? CL QUORUM does not read data from all replicas. Applications running = outside the cluster have to copy the data from the cluster, a much more = copy/network intensive operation than using CL.QUORUM with the built-in = Hadoop support. >=20 > Thank you, > Andrey