Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@locus.apache.org Received: (qmail 95191 invoked from network); 28 Nov 2008 03:32:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 28 Nov 2008 03:32:19 -0000 Received: (qmail 78238 invoked by uid 500); 28 Nov 2008 03:32:29 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 78213 invoked by uid 500); 28 Nov 2008 03:32:29 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 78202 invoked by uid 99); 28 Nov 2008 03:32:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Nov 2008 19:32:29 -0800 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of chihchun.chu@gmail.com designates 209.85.142.191 as permitted sender) Received: from [209.85.142.191] (HELO ti-out-0910.google.com) (209.85.142.191) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Nov 2008 03:31:03 +0000 Received: by ti-out-0910.google.com with SMTP id i7so1125600tid.9 for ; Thu, 27 Nov 2008 19:31:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to :subject:mime-version:content-type; bh=k0xfH/97N92bCpspf7JLXNoUikpnZuBj6Ov6q77igXA=; b=x/9PviVCHFz8YrYWqMpPQ7hBlkcMR6AVsj/iT8fhG8tsHfNHvk8wrgv9KvSUtEVpKZ 2qsmTUpkh4Bhqe/rJNa4MjIThOILrmQlWpau2FerFMEnYuzC31pEXJJaNiIKsq9PcAUv PD9nEtPwcsruxSZm5gakY+EMe648nuEKXxdXI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type; b=bo7i//Og0IlIZ6hmF/3HVx3y86se854Wvpf9Tw6N2CQrqRHOTTaKLfz9gW1AGNZX+I QfjKFb8ThTJVySBMltk1UWv3wMx84wT8ssvXX3ME4ECoICvb2z8eWyL/M278SeSSszf5 fJ5qEbqgHsoU3IY8iJ3JFfZSDPNSddekTKUUc= Received: by 10.110.28.15 with SMTP id b15mr2705409tib.0.1227843108962; Thu, 27 Nov 2008 19:31:48 -0800 (PST) Received: by 10.110.92.6 with HTTP; Thu, 27 Nov 2008 19:31:48 -0800 (PST) Message-ID: Date: Fri, 28 Nov 2008 11:31:48 +0800 From: "=?BIG5?B?wfm+QaZz?=" To: hbase-user@hadoop.apache.org Subject: data duplicate? MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_65029_27468474.1227843108958" X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_65029_27468474.1227843108958 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hi, I revised the sample code "Bulk Import" written by Allen Day to upload a flat data file to a hbase table. My table schema is designed as: . The table description found by hbase shell is as follow: {NAME => 'ATCGeo', IS_ROOT => 'false', IS_META => 'false', FAMILIES => [{NAME => 'photo_id', BLOOMFILTER => 'f alse', VERSIONS => '30000', COMPRESSION => 'NONE', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'true', B LOCKCACHE => 'true'}, {NAME => 'trail_id', BLOOMFILTER => 'false', VERSIONS => '30000', COMPRESSION => 'NONE', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'true', BLOCKCACHE => 'true'}]} Some of the data was been found as duplicate-with the same content but the different timestamp. For example, I use the: get '', '',{COLUMN=>'col1',VERSION=>30000} the results are: timestamp=3090896685592411, value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2265.jpg timestamp=3090896682597411, value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2264.jpg timestamp=3090731558521386, value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2265.jpg timestamp=3090731556503386, value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2264.jpg I am sure that the data in original file is unique. Could anyone tell me what's the possible reasons? Would appreciate any help! Chu ------=_Part_65029_27468474.1227843108958--