Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6DCC9D9EB for ; Thu, 18 Oct 2012 22:43:26 +0000 (UTC) Received: (qmail 63738 invoked by uid 500); 18 Oct 2012 22:43:24 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 63709 invoked by uid 500); 18 Oct 2012 22:43:24 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 63698 invoked by uid 99); 18 Oct 2012 22:43:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 22:43:24 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ailinykh@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ea0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2012 22:43:18 +0000 Received: by mail-ea0-f172.google.com with SMTP id k13so2548792eaa.31 for ; Thu, 18 Oct 2012 15:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=r9qQIPkv6eRv9enD1RC783IInR4CFhwT1l7FQdrLkRw=; b=IYwfK+aqS8aM7WaRtCu0i64RxvOOaLhFZH76H8LHdozXT+kp5Mfa/84wGTfbuXAsxB waENJgYYhNxkw2Qt6LdP+HRwLtDbJ+KHeyO0/nHegGhD5f9UUiEJKcfmlziKGZaPo5sV SfRIAwo16eUMIFfzrO/9AH+PKefYObhXP1h1Qgab96/SdqDT+th2CqtC90JPYYBLZ5fK 347wD2sDYoNXMPKnRQv3wOhl/LMXkwOodZPPUk2sdJcYGrgRVU382VOKnOiXecNuD+gP y5Lm7vhx7VZeFqqvBpGpTMW0lvjAmM9uSMHLhVCii4rIXLmeVpvxw3kukefhDr2oeIUF bsgg== MIME-Version: 1.0 Received: by 10.14.220.71 with SMTP id n47mr33942714eep.26.1350600177606; Thu, 18 Oct 2012 15:42:57 -0700 (PDT) Received: by 10.14.178.68 with HTTP; Thu, 18 Oct 2012 15:42:57 -0700 (PDT) In-Reply-To: <326A0EF1-1584-4355-92F4-D3C94CAE5AEA@gmail.com> References: <326A0EF1-1584-4355-92F4-D3C94CAE5AEA@gmail.com> Date: Thu, 18 Oct 2012 15:42:57 -0700 Message-ID: Subject: Re: hadoop consistency level From: Andrey Ilinykh To: user@cassandra.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Oct 18, 2012 at 2:31 PM, Jeremy Hanna wrote: > > On Oct 18, 2012, at 3:52 PM, Andrey Ilinykh wrote: > >> On Thu, Oct 18, 2012 at 1:34 PM, Michael Kjellman >> wrote: >>> Not sure I understand your question (if there is one..) >>> >>> You are more than welcome to do CL ONE and assuming you have hadoop nod= es >>> in the right places on your ring things could work out very nicely. If = you >>> need to guarantee that you have all the data in your job then you'll ne= ed >>> to use QUORUM. >>> >>> If you don't specify a CL in your job config it will default to ONE (at >>> least that's what my read of the ConfigHelper source for 1.1.6 shows) >>> >> I have two questions. >> 1. I can benefit from data locality (and Hadoop) only with CL ONE. Is >> it correct? > > Yes and at QUORUM it's quasi local. The job tracker finds out where a ra= nge is and sends a task to a replica with the data (local). In the case of= CL.QUORUM (see the Read Path section of http://wiki.apache.org/cassandra/A= rchitectureInternals), it will do an actual read of the data on the node cl= osest (local). Then it will get a digest from other nodes to verify that t= hey have the same data. So in the case of RF=3D3 and QUORUM, it will read = the data on the local node where the task is running and will check the nex= t closest replica for a digest to verify that it is consistent. Informatio= n is sent across the wire and there is the latency of that, but it's not th= e data that's sent. > >> 2. With CL QUORUM cassandra reads data from all replicas. In this case >> Hadoop doesn't give me any benefits. Application running outside the >> cluster has the same performance. Is it correct? > > CL QUORUM does not read data from all replicas. Applications running out= side the cluster have to copy the data from the cluster, a much more copy/n= etwork intensive operation than using CL.QUORUM with the built-in Hadoop su= pport. > Thank you very much, guys! I have a much clearer picture now. Andrey