Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 890A59012 for ; Mon, 21 May 2012 17:31:58 +0000 (UTC) Received: (qmail 90900 invoked by uid 500); 21 May 2012 17:31:55 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 90849 invoked by uid 500); 21 May 2012 17:31:55 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 90835 invoked by uid 99); 21 May 2012 17:31:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 May 2012 17:31:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of murat.migdisoglu@gmail.com designates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 May 2012 17:31:49 +0000 Received: by yhfq46 with SMTP id q46so5833002yhf.35 for ; Mon, 21 May 2012 10:31:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=uwDHihiV6Lnq8BMKAcH8pEmexPU0FrB7TlQTkhZNcKs=; b=iqihkOOrWM8qKz+cYLkbYvTrzOpQnd312sMPOK8mesC0OfVihghcTJM9fzXgXTitak ZEbc81nQ3UReEXjpFgSqx2mpo6D1hIjJO0KbO7s0CFXPWhdNIkVzk50olkOqMs71lFh8 t0OzGOmDdYJ3aL9myuhgmqeIBN2x4o5MIVH6NcZyYkpVUo5GB8/jEnMNTej0rxADCUzh pPusO8C37nEEkIHB3MsA2omJn/X9tXrvOCd/LTrR16qzOB9Ke6IOmC87wXn1dQFbWqcR prw0F8TGo1bdKBXWb3XAva8GkoWZ9ssSa5TWxhuDIXU+LT/G2yJgSA7ogCastZyoTkB/ gbEQ== MIME-Version: 1.0 Received: by 10.50.187.231 with SMTP id fv7mr7441763igc.51.1337621488694; Mon, 21 May 2012 10:31:28 -0700 (PDT) Received: by 10.231.157.6 with HTTP; Mon, 21 May 2012 10:31:28 -0700 (PDT) Date: Mon, 21 May 2012 19:31:28 +0200 Message-ID: Subject: Is mapper called per row when used with Cassandra From: murat migdisoglu To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae9340fcfad772904c08f457c X-Virus-Checked: Checked by ClamAV on apache.org --14dae9340fcfad772904c08f457c Content-Type: text/plain; charset=ISO-8859-1 Hi, I'm quite new in Hadoop and trying to understand how the task split works when used with Cassandra ColumnFamilyInputFormat. I have a very basic scenario: Cassandra has the sessionId and a bson data that contains the username. I want to go through all rows and dump the row to a file when the username is matching to a certain criteria. And I do not need any Reducer or Combiner for now. After I've written the following very simple hadoop job, I see from the logs that my mapper function is called per each row. Is that normal? If that is the case, doing such a search operation in a big dataset would take hours if not days... I guess i need a better understanding on how splitting the job into tasks works exactly.. @Override public void map(ByteBuffer key, SortedMap columns, Context context) throws IOException, InterruptedException { String rowkey = ByteBufferUtil.string(key); String ip = context.getConfiguration().get(IP); IColumn column = columns.get(sourceColumn); if (column == null) return; ByteBuffer byteBuffer = column.value(); ByteBuffer bb2 = byteBuffer.duplicate(); DataConvertor convertor= fromBson(byteBuffer, DataConvertor.class); String username= convertor.getUsername(); BytesWritable value = new BytesWritable(); if (username != null && username.equals(cip)) { byte[] arr = convertToByteArray(bb2); value.set(new BytesWritable(arr)); Text tkey = new Text(rowkey); context.write( tkey, value); } else { log.info("ip not match [" + ip + "]"); } } Thanks in advance Kind Regards --14dae9340fcfad772904c08f457c--