Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3FAE1D682 for ; Fri, 9 Nov 2012 15:59:43 +0000 (UTC) Received: (qmail 94500 invoked by uid 500); 9 Nov 2012 15:59:41 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 94063 invoked by uid 500); 9 Nov 2012 15:59:39 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 94021 invoked by uid 99); 9 Nov 2012 15:59:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 15:59:38 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL,T_FILL_THIS_FORM_SHORT X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.41] (HELO mail-pb0-f41.google.com) (209.85.160.41) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Nov 2012 15:59:32 +0000 Received: by mail-pb0-f41.google.com with SMTP id rq2so3183814pbb.14 for ; Fri, 09 Nov 2012 07:59:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type :x-gm-message-state; bh=PKq28kRhlLgul/eKUrI/CU38CU3IauQViQjkyaw1UDA=; b=BYVGLxZlDlrELScHZhhgZ2e6pziR2JUk3+fWobCjML+uPx6b0x58zoEAuhYbx1nCvG epZ9UNSs52KQDmMlGEe/Sc5+p3uj8A5C+A/thJX+cDDRBCcdypzrV2xYcYlLBgI1UDqe 2d4VYzlLIoyGEBNn4V/cAB/yb+yqvaJt2JNsRJvDxrVWisyF2j0detRJmBGkVudWWBnP nY11j1iePPVv/pYMG9y8inMSXiHRSYdt3pw9VqdApj6HUajEGCjxWBgcEWamI98KIYXn ZwnxufEZSYr/ytwy3SzsiMACKatPA4qBgGpyHn/yuwi8Fgg0hRbmGNOfunRWRrqZmJT8 9F8A== MIME-Version: 1.0 Received: by 10.68.189.163 with SMTP id gj3mr20224273pbc.110.1352476751049; Fri, 09 Nov 2012 07:59:11 -0800 (PST) Received: by 10.66.246.232 with HTTP; Fri, 9 Nov 2012 07:59:10 -0800 (PST) Date: Fri, 9 Nov 2012 16:59:10 +0100 Message-ID: Subject: scan filtering column familly return wrong cell From: Damien Hardy To: user Content-Type: multipart/alternative; boundary=e89a8ff1ccd250332904ce120842 X-Gm-Message-State: ALoCoQlNdj3ZGVdDkYsPba46DRDcH3h/4rNEWPWewJjC0875Qf363l88JAdqAuxRuo1bR8TQabOu X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1ccd250332904ce120842 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hello, I am a bit confused here... I try to execute a M/R to import data in HBase table 'Consultation'. Running on CDH4.1.2 map function emits context.write(ImmutableBytesWritable, KeyValue) conf summary : job.setOutputFormatClass(TableOutputFormat.class); job.setInputFormatClass(DataDrivenDBInputFormat.class); job.getConfiguration().set(TableOutputFormat.OUTPUT_TABLE, "Consultation"); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(KeyValue.class); The reduce class is : static class ImportReducer extends TableReducer { @Override public void reduce(ImmutableBytesWritable row, Iterable kvs, Reducer.Context context) throws java.io.IOException, InterruptedException { Put p =3D new Put(row.copyBytes()); int i =3D 0; byte[] rk =3D null; for (KeyValue kv: kvs) { p.add(kv); if ( Bytes.compareTo(CF_VISITED, 0, CF_VISITED.length, kv.getBuffer(), kv.getFamilyOffset(), kv.getFamilyLength() ) =3D=3D 0 ) { i++; } } p.add(CF_COUNTER,QA_COUNTER,Bytes.toBytes(i)); context.write(new ImmutableBytesWritable(row),p); } } hbase(main):038:0> scan 'Consultation', {COLUMNS=3D> *'visiting_tl'*, LIMIT =3D> 10 } ROW COLUMN+CELL 00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15 column=3D* visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7, timestamp=3D1266998781000, value=3D\x00\x00\x00\x00 001316263fc8b454bbd86dff1587a347-\x00>t\x05 column=3D* visited_tl:*\x7F\xFF\xFE\xD7\x0F\xB8u_\x00\x08\xE1\xA0, timestamp=3D1275341540000, value=3D\x00\x00\x00\x00 001497e68d7c71a3cd281860484fa6be-\x00/\x0E^ column=3D* visited_tl:*\x7F\xFF\xFE\xD8\x06\x9B\xB0\xB7\x00(3S, timestamp=3D1271199453000, value=3D\x00\x00\x00\x00 001845aac2462a1c24b36eb90ab698cf-\x00\x04\x1E\xF5 column=3D* visited_tl:*\x7F\xFF\xFE\xD6\xA8\xB9-\xEF\x002Po, timestamp=3D1277069546000= , value=3D\x00\x00\x00\x01 0019cec2c1f38c42b1c540ef7708c6a9-\x00;\xE0\x97 column=3D* visited_tl:*\x7F\xFF\xFE\xD8\xF9\xC7\x0C_\x00\x02?., timestamp=3D1267119748000, value=3D\x00\x00\x00\x00 001de6b92754b0ef44ee10bf2bdfe3c3-\x00%\x1AV column=3D* visited_tl:*\x7F\xFF\xFE\xD6\xE4H\x99\xC7\x00\x0F\x7F9, timestamp=3D1276070291000, value=3D\x00\x00\x00\x01 00217f082f96eb12108c139b99a3ccb7-\x00\x02w\x08 column=3D* visited_tl:*\x7F\xFF\xFE\xD8\xEB\x1B\x95\xEF\x00\x0A7\x19, timestamp=3D1267365866000, value=3D\x00\x00\x00\x00 0021cbfd559f56dd298e4b4fee7626a9-\x00r\xBF\xFA column=3D* visited_tl:*\x7F\xFF\xFE\xD6\xA1\x0B-\x0F\x00\x03\xBC\x8B, timestamp=3D1277198390000, value=3D\x00\x00\x00\x02 00266c02a60f9a6efb5d24317e6032a0-\x00\x0E]+ column=3D* visited_tl:*\x7F\xFF\xFE\xD6\xBC\x0D\xD1\x7F\x00/ q, timestamp=3D1276745232000, value=3D\x00\x00\x00\x01 0026dbbd6562da5b79f1b09e94e3b973-\x00C[\x93 column=3D* visited_tl:*\x7F\xFF\xFE\xD7\xB0\xFA\xB7/\x00\x02~\x09, timestamp=3D1272636066000, value=3D\x00\x00\x00\x01 10 row(s) in 2.1130 seconds hbase(main):036:0> get 'Consultation', "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15" COLUMN CELL *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7* timestamp=3D1266998781000, value=3D\x00\x00\x00\x00 *visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7* timestamp=3D1266998781000, value=3D\x00\x00\x00\x00 visits_count:_counter timestamp=3D1352475456545, value=3D\x00\x00\x02\xA1 3 row(s) in 0.3260 seconds hbase(main):037:0> get 'Consultation', "00070db1aa26d1906a078a1e03f788cb-\x00\x13\x80\x15", *'visiting_tl:'* COLUMN CELL *visited_tl:*\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 timestamp=3D1266998781000, value=3D\x00\x00\x00\x00 1 row(s) in 0.1650 seconds So I have 3 problems : * table is only 1 VERSION enable : who can I get the cell visited_tl:\x7F\xFF\xFE\xD9\x00\xFC\xDB\xB7\x001\xC5\xA7 2 time for a single row ? * when I explicitly query for CF 'visiting_tl:' , I get a 'visited_tl:' cell ... WTF ? * the Counter is (int)673 ... where are my 673 visited_tl cell ? (673 is the good value according to my source) Cheers, --=20 Damien HARDY IT Infrastructure Architect Viadeo - 30 rue de la Victoire - 75009 Paris - France T : +33 1 80 48 39 73 =96 F : +33 1 42 93 22 56 --e89a8ff1ccd250332904ce120842--