Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8E1F8E512 for ; Thu, 21 Feb 2013 04:29:59 +0000 (UTC) Received: (qmail 78732 invoked by uid 500); 21 Feb 2013 04:29:57 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 78677 invoked by uid 500); 21 Feb 2013 04:29:56 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 78651 invoked by uid 99); 21 Feb 2013 04:29:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 04:29:56 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ramkrishna.s.vasudevan@gmail.com designates 209.85.216.49 as permitted sender) Received: from [209.85.216.49] (HELO mail-qa0-f49.google.com) (209.85.216.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 04:29:49 +0000 Received: by mail-qa0-f49.google.com with SMTP id o13so2786133qaj.8 for ; Wed, 20 Feb 2013 20:29:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=EUHN2X25G04COF+j4uf+uCZNvHPxoHVrqniibxqVb/g=; b=DLaghZuYkl7bicRYOL6t5rBlKCFDLQjZn3bScSpzydYkNne5+Zn5oxkVWdAysFKGub 0JScEy7PCMC6UwivZ/iVP/IEZVpNX/0swS1PBXOTZItdJ95dh3nzW/IQInMD2P2wvrCH /pLZqMMEFMciyosThmi40kaicXxbz9XzxqKDe99cXNFyRL2GPCte5sB1WaByKSeefiz1 uy9G3neAFPuhAG3FiamkJGT3aH+3lIsAv/6+xaf0mVza0BQWMs9EMAi2Isgu4xQv94+Z 11IPNt5b03CrSIShe/Sope3qBOXFY0jQMHc34vzrMrl7MYqIClF8aONajbsN9oghYZsx 4C+A== MIME-Version: 1.0 X-Received: by 10.224.183.197 with SMTP id ch5mr11265251qab.27.1361420968428; Wed, 20 Feb 2013 20:29:28 -0800 (PST) Received: by 10.49.128.202 with HTTP; Wed, 20 Feb 2013 20:29:28 -0800 (PST) In-Reply-To: <8CFDDF21FBF066F-12CC-380AA@webmail-m038.sysops.aol.com> References: <8CFD44264082D85-5FC-277BF@webmail-d004.sysops.aol.com> <8CFD4459144C4A9-5FC-27AE0@webmail-d004.sysops.aol.com> <8CFD47E9884442F-5FC-2C867@webmail-d004.sysops.aol.com> <8CFDDF21FBF066F-12CC-380AA@webmail-m038.sysops.aol.com> Date: Thu, 21 Feb 2013 09:59:28 +0530 Message-ID: Subject: Re: split table data into two or more tables From: ramkrishna vasudevan To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=20cf303b4181368ccb04d6348566 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303b4181368ccb04d6348566 Content-Type: text/plain; charset=ISO-8859-1 The Import.java in the package org.apache.hadoop.hbase.mapreduce. This comes along with the src code. Have you tried the option of using the SingleColumnValueFilter. One thing you need to note that the if you are going for a search on the entire table then all the regions has to be scanned but using this filter will return only the rows that satisfy the specified condition, but as you are trying go with Mapreduce these mapper tasks run paralleley on the regions. Regards Ram On Thu, Feb 21, 2013 at 7:57 AM, wrote: > Hello, > > I see 0.94.5 has already been released, so wondered how can I solve the > issue that we have. In more detail we have a table with billions of > records. Most of the mapreduce job that we run select from this table > records that has a family mk with a given value. For example, > > get 'mytable' ,'row1', 'mk' > COLUMN CELL > mk:_genmrk_ timestamp=1360869679003, > value=1360869340-1376304115 > mk:_updmrk_ timestamp=1360869376272, > value=1360869340-1376304115 > mk:dist > > Map of a mapreduce job goes over all records and checks if _genmrk_ is > equal to the given value. So, my question is that is it possible to select > all records with mk:_genmrk_ =myvalue and feed them to map of mapreduce job > instead of iterating over all records? > > > Thanks in advance. > Alex. > > > > > > > > > > -----Original Message----- > From: Ted Yu > To: user > Sent: Fri, Feb 8, 2013 6:23 pm > Subject: Re: split table data into two or more tables > > > See the following javadoc in Scan.java: > > * To only retrieve columns within a specific range of version timestamps, > > * execute {@link #setTimeRange(long, long) setTimeRange}. > You can search for the above method in unit tests. > > In your use case, is family f the only family ? > If not, take a look at HBASE-5416 which is coming in 0.94.5 > family f would be the essential column. > > Cheers > > On Fri, Feb 8, 2013 at 5:47 PM, wrote: > > > Hi, > > > > Thanks for suggestions. How a time range scan can be implemented in java > > code. Is there any sample code or tutorials? > > Also, is it possible to select by a value of a column? Let say I know > that > > records has family f and column m, and new records has m=5. I need to > > instruct hbase to send only these records to the mapper of mapred jobs. > > > > Thanks. > > Alex. > > > > > > > > > > > > > > > > -----Original Message----- > > From: Ted Yu > > To: user > > Sent: Fri, Feb 8, 2013 11:05 am > > Subject: Re: split table data into two or more tables > > > > > > bq. in a cluster of 2 nodes +1 master > > I assume you're limited by hardware in the regard. > > > > bq. job selects these new records > > Have you used time-range scan ? > > > > Cheers > > > > On Fri, Feb 8, 2013 at 10:59 AM, wrote: > > > > > Hi, > > > > > > The rationale is that I have a mapred job that adds new records to an > > > hbase table, constantly. > > > The next mapred job selects these new records, but it must iterate over > > > all records and check if it is a candidate for selection. > > > Since there are too many old records iterating though them in a cluster > > of > > > 2 nodes +1 master takes about 2 days. So I thought, splitting them into > > two > > > tables must reduce this time, and as soon as I figure out that there is > > no > > > more new record left in one of the new tables I will not run mapred job > > on > > > it. > > > > > > Currently, we have 7 regions including ROOT and META. > > > > > > > > > Thanks. > > > Alex. > > > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > From: Ted Yu > > > To: user > > > Sent: Fri, Feb 8, 2013 10:40 am > > > Subject: Re: split table data into two or more tables > > > > > > > > > May I ask the rationale behind this ? > > > Were you aiming for higher write throughput ? > > > > > > Please also tell us how many regions you have in the current table. > > > > > > Thanks > > > > > > BTW please consider upgrading to 0.94.4 > > > > > > On Fri, Feb 8, 2013 at 10:36 AM, wrote: > > > > > > > Hello, > > > > > > > > I wondered if there is a way of splitting data from one table into > two > > or > > > > more tables in hbase with iidentical schemas, i.e. if table A has > 100M > > > > records put 50M into table B, 50M into table C and delete table A. > > > > Currently, I use hbase-0.92.1 and hadoop-1.4.0 > > > > > > > > Thanks. > > > > Alex. > > > > > > > > > > > > > > > > > > > > > > --20cf303b4181368ccb04d6348566--