Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 37889 invoked from network); 19 Jul 2010 18:18:02 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Jul 2010 18:18:02 -0000 Received: (qmail 22557 invoked by uid 500); 19 Jul 2010 18:18:01 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 22360 invoked by uid 500); 19 Jul 2010 18:18:01 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 22352 invoked by uid 99); 19 Jul 2010 18:18:00 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Jul 2010 18:18:00 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [206.190.48.151] (HELO web52308.mail.re2.yahoo.com) (206.190.48.151) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 19 Jul 2010 18:17:53 +0000 Received: (qmail 55034 invoked by uid 60001); 19 Jul 2010 18:16:32 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1279563392; bh=LnItztUifqXk/3aP7pWMMgrg5ezgodw2LNyGrHJwBQ4=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=tBKssBlx2HrHxX3/Uf+iRMwGGkv7tG3jv93RjQn56OBoMdo8owVvigieothRrD9q4tHEMqaL6RCv2BPgqhT6WhwXGUtJIWGFFLJFxRjwgkQDspb/Nnf11hZpvrEFaglEw5KXObH7qVFepv3pUz8SKc3dhF8y+X9cS3S2YBA1bGs= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:X-YMail-OSG:Received:X-Mailer:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=z54IRZe7JK7ALHzcI+WZA9RiZrrxZOpP4rDAbvchgIv+p/FZs5rLOxRdGLaYWKfKUeGXTo9bxWBx5trKe/Gn6Ps5y9RgQyttTqKAUYK4hrWZL4EZY+t/r/9+oAQxMv52OpdHRN/w8xytuKWzRftxyi2costDemQCsHhvmdh7EtQ=; Message-ID: <716625.54933.qm@web52308.mail.re2.yahoo.com> X-YMail-OSG: l1lRy24VM1k4IhKKI46TJdiiTNDGZXBqH5y.b4hhPaUkOxp mABMSmTs2P00m4JTvSFXiDkfIxGYC4etLrOHAOW5BTFqYEE18JYIV_RBW2.h y1PXWONBkE3rxy4UIm_s6WWyB9IdOnCGLNaNQ0Z.W_uomnDEyVz5dx7sBeq4 wfSWXzsYd6zZIBIbxGTem7sOJlpd952t1_vADov_JgvQvyN.zT8.Wfjrk.Uj Oyv5LhEe2QkCGK5u2zArYm161kkykMIv9HyvBH1yb_wK1Ksf289wjGBrlmwE Pi_N5daJD28bqz3u_cLLAaDLAfOWCH6qGKI1HJv6spjiY6rv4VH33WoxfrcD vrRBpBdjoYdsUR6zb79MbfeHtDzo25WII_Q-- Received: from [12.155.58.181] by web52308.mail.re2.yahoo.com via HTTP; Mon, 19 Jul 2010 11:16:32 PDT X-Mailer: YahooMailClassic/11.2.4 YahooMailWebService/0.8.104.276605 Date: Mon, 19 Jul 2010 11:16:32 -0700 (PDT) From: Stuart Smith Subject: RE: Run MR job when my data stays in hbase? To: user@hbase.apache.org In-Reply-To: <5B0057F22075EF428770F1F191395EE1288D59E2@tss-2k7cls3.ad.trilliumstaffing.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hello,=0A=0A You can ignore this if you're already rock solid on writing M= /R jobs, but just in case you're as new to this as I am: =0A=0ABe careful y= ou have all your dependencies lined up in the jar you're creating your M/R = job in. If you're using Eclipse this means selecting "Extract required lib= raries into generated jar". =0A=0AWithout this you get strange "Map class n= ot found errors", similar to when you forget to make your map class static = or forget to call setJarByClass() on your job. =0A=0AAll the examples I saw= that used the *new api* were a little more complicated than needed. A stri= pped down example with the new api:=0A=0Apublic static class Mapper extends= TableMapper=0A{=0A=09@Override=0A=09public void map( Imm= utableBytesWritable key, Result value, Context context )=0A=09throws IOExce= ption, InterruptedException=0A=09{=0A //Don't forget to make sure to= load this as UTF-8=0A=09=09String sha256 =3D new String( key.get(), "UTF-8= " );=0A //just calling value.value() will NOT give you what you want= =0A byte[] valueBuffer =3D value.getValue(Bytes.toBytes(/*family*/)= , Bytes.toBytes(/*qualifier*/));=09=0A /**Do stuff**/=0A cont= ext.write( [some text], [some int] );=0A }=0A}=0A=0Apublic static class = Reduce extends TableReducer=0A{=0A=09@Override=0A=09= public void reduce( Text key, Iterable Values, Context context= )=0A=09throws IOException, InterruptedException=0A=09{=0A /**output= of a reduce job needs to be a [something],Put object pair*/=0A=09=09Put ou= tputRow =3D new Put( Bytes.toBytes("row key") );=0A=09=09outputRow.add( Byt= es.toBytes(/*output family*/), Bytes.toBytes(/*output qualifier*/), Bytes.t= oBytes(count) );=0A=09=09context.write( /*some string*/, outputRow );=0A = }=0A}=0A=0Apublic static void main(String[] argv) throws Exception =0A{=0A= =09Job validateJob =3D new Job( configuration, /*job name*/ );=0A //don'= t forget this!=0A=09validateJob.setJarByClass(/*main class*/.class);=0A=09= =09=09=0A=09//don't add anything, and it will scan everything (according to= docs)=0A=09Scan scan =3D new Scan();=0A=09scan.addColumn( Bytes.toBytes(/*= input family*/), Bytes.toBytes(/*input qualifier*/) );=0A=09=09=09=0A=09Tab= leMapReduceUtil.initTableMapperJob(/*input tablename*/, scan, Mapper.class,= Text.class, IntWritable.class, validateJob);=0A=09TableMapReduceUtil.initT= ableReducerJob(/*output table name*/, Reduce.class, validateJob);=0A=09=0A= =09validateJob.waitForCompletion(true);=0A}=0A=0ABut look at the examples! = I just thought some simple highlights might help. Don't forget that you can= issue Put()'s from your Map() tasks, if you already have the data you need= assembled (just open a connection in the map constructor):=0A=0A=09super()= ;=0A=09this.hbaseConfiguration =3D new HBaseConfiguration();=0A=09this.hbas= eConfiguration.set("hbase.master", "ubuntu-namenode:60000");=0A=09this.file= MetadataTable =3D new HTable( hbaseConfiguration, /*tableName*/ );=0A=0Aand= issue the Put() in your map() method. This can take the load of your reduc= e() tasks, which may speed things up a bit.=0A=0ACaveat emptor:=0AI just st= arted on all this stuff. ;)=0A=0AHope it helps.=0A=0ATake care,=0A -stu=0A= =0A=0A=0A--- On Mon, 7/19/10, Hegner, Travis wrote= :=0A=0A> From: Hegner, Travis =0A> Subject: RE: Run= MR job when my data stays in hbase?=0A> To: "user@hbase.apache.org" =0A> Date: Monday, July 19, 2010, 11:55 AM=0A> Also make s= ure that the=0A> $HBASE_HOME/hbase-.jar,=0A> $HBASE_HOME/lib/zooke= eper-.jar, and the=0A> $HBASE_HOME/conf/ are all on the classpath = in your=0A> $HADOOP_HOME/conf/hadoop-env.sh file. That configuration=0A> mu= st be cluster wide.=0A> =0A> With that, your map and reduce tasks can acces= s zookeeper=0A> and hbase objects. You can then use the TableInputFormat=0A= > with TableOutputFormat, or you can use TableInputFormat, and=0A> your red= uce tasks can write data directly back into Hbase.=0A> You're problem, and = your dataset, will dictate which of=0A> those methods is more efficient.=0A= > =0A> Travis Hegner=0A> http://www.travishegner.com/=0A=0A> =0A> -----Orig= inal Message-----=0A> From: Andrey Stepachev [mailto:octo47@gmail.com]=0A> = Sent: Monday, July 19, 2010 9:28 AM=0A> To: user@hbase.apache.org=0A> Subje= ct: Re: Run MR job when my data stays in hbase?=0A> =0A> 2010/7/19 elton sk= y :=0A> =0A> > My question is if I wanna run the ba= ckgroup process as=0A> a MR job, can I get=0A> > data from hbase, rather th= an hdfs, with hadoop? How do=0A> I do that?=0A> > I appreciate if anyone ca= n provide some simple example=0A> code.=0A> =0A> Look at org.apache.hadoop.= hbase.mapreduce package in hbase=0A> sources=0A> and as real example:=0A> o= rg.apache.hadoop.hbase.mapreduce.RowCounter=0A> =0A> The information contai= ned in this communication is=0A> confidential and is intended only for the = use of the named=0A> recipient.=A0 Unauthorized use, disclosure, or copying= is=0A> strictly prohibited and may be unlawful.=A0 If you have=0A> receive= d this communication in error, you should know that=0A> you are bound to co= nfidentiality, and should please=0A> immediately notify the sender or our I= T Department at=A0=0A> 866.459.4599.=0A> =0A=0A=0A