Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B5889104E1 for ; Thu, 11 Jul 2013 22:45:31 +0000 (UTC) Received: (qmail 57740 invoked by uid 500); 11 Jul 2013 22:45:29 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 57692 invoked by uid 500); 11 Jul 2013 22:45:29 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 57682 invoked by uid 99); 11 Jul 2013 22:45:29 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jul 2013 22:45:29 +0000 X-ASF-Spam-Status: No, hits=2.7 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,REPTO_QUOTE_YAHOO,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [98.138.91.105] (HELO nm12-vm6.bullet.mail.ne1.yahoo.com) (98.138.91.105) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Jul 2013 22:45:20 +0000 Received: from [98.138.90.55] by nm12.bullet.mail.ne1.yahoo.com with NNFMP; 11 Jul 2013 22:44:58 -0000 Received: from [98.138.89.240] by tm8.bullet.mail.ne1.yahoo.com with NNFMP; 11 Jul 2013 22:44:57 -0000 Received: from [127.0.0.1] by omp1013.mail.ne1.yahoo.com with NNFMP; 11 Jul 2013 22:44:57 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 976601.56524.bm@omp1013.mail.ne1.yahoo.com Received: (qmail 21000 invoked by uid 60001); 11 Jul 2013 22:44:57 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s1024; t=1373582697; bh=RoRvUjlEmjN+txgwbF+o2E0MKBcMWzjWvKBzmfDxarE=; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=DOvWLGa4ED/wwuBD3fCcf3W+bKa5NzSFS6ISU+iWYNgVAnh0dNPZBtpoFP/57XOiWVKekir/PmcpNo4o/GM8+OobjOik9Q/aCd1mwSUt5wdekPISSEdxMpOKZDeXZdxDlzjG9qabnZHsA8k//yD4qSp/Kvt4xA80XT+2uiQjvNc= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=X-YMail-OSG:Received:X-Rocket-MIMEInfo:X-Mailer:References:Message-ID:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=OvtPi3oUUebdxxipX4v5qLHrJpAWFuxCjU8FrYwChtLGtbYXTqU2bYDJEhkMDtoGGEBSmNIyNDttrMMxGMWVprGn530jqKPh65R6cvwBOZkEAvgrm9DW2IU52TzsnsqUy2SJT/iBs9UgjvJAGmHOzQbdW7rbvsOO+/VET/5aXT0= ; X-YMail-OSG: 4z9sYpwVM1nJB569e4Da5oF8btX0NK48PvXt06kGdra2MC4 _XROF2iY7PE5Dh3pAbmY94JYTCULT8Sz4K_Wd.gy8TjgSCJsUePuw2VYhsVh B6oEF2ROYY8giizQAmABIIQUBJRyKZMMxoicmWu1EyLePy0CfOXV1zadBelv QoeFK0Va5thMBxR7os4LphmJ9EtawinexJsv4tYaDtOV486RYK.4VAvtw_zS sGLMKC8.1.gAodukyV4KLTvGIsjH0hNe.v3dV3BVupZo7b.oeFUGcuvalN5y jR7spEAObSZn1gFPvEYPWXzHnLYFHuQsDe2uttlraBeE37P48uGZ8HczqVRi Onc7oOEtdaM3tgfj1YGzfvTM7m8eZbXUDSR3WwGZfa3T83Xu.QJsBN5AWgHM XvDtBL59zJ6t5wKEh.CwV_Uu8s3e97_4eNOH3T7JkSKTaf_vyVaoj951VdRi cjKFmE4kL2jFWZRV3tkQJ29lOYOz2Y2VaALENCx548X3x5hiMP.h56J36uZI PHWkr2LhyYNqPq5yWBD5Tg0XfnugG7e8gu866mCdblCUTBMXtW_A01yVGz4E ZHyQDzSD2gq.vs2vc0wZ3U3BQErG1e8HNGUs0yjNoV93BJ2wM3WHYd6hC1Ws - Received: from [216.113.169.239] by web120906.mail.ne1.yahoo.com via HTTP; Thu, 11 Jul 2013 15:44:57 PDT X-Rocket-MIMEInfo: 002.001,VGhhbmtzIHZlcnkgbXVjaCBmb3IgdGhlIGhlbHAsIFRlZCAmIEF6dXJyeS4gSSB3cm90ZSBhIHZlcnkgc2ltcGxlIE1SIHByb2dyYW0gd2hpY2ggdGFrZXMgSEJhc2UgdGFibGUgYXMgaW5wdXQgYW5kIG91dHB1dHMgdG8gYSBIREZTIGZpbGUuIFVuZm9ydHVuYXRlbHksIEkgcnVuIGludG8gdGhlIGZvbGxvd2luZyBlcnJvcjoKCmphdmEubGFuZy5DbGFzc0Nhc3RFeGNlcHRpb246IG9yZy5hcGFjaGUuaGFkb29wLmlvLkxvbmdXcml0YWJsZSBjYW5ub3QgYmUgY2FzdCB0byBvcmcuYXBhY2hlLmhhZG9vcC5oYmEBMAEBAQE- X-Mailer: YahooMailWebService/0.8.148.557 References: <1372826091.75664.YahooMailNeo@web120903.mail.ne1.yahoo.com> <1372865652.49137.YahooMailNeo@web120904.mail.ne1.yahoo.com> <1373476558.93417.YahooMailNeo@web120903.mail.ne1.yahoo.com> <1373478917.13588.YahooMailNeo@web120902.mail.ne1.yahoo.com> Message-ID: <1373582697.20402.YahooMailNeo@web120906.mail.ne1.yahoo.com> Date: Thu, 11 Jul 2013 15:44:57 -0700 (PDT) From: "S. Zhou" Reply-To: "S. Zhou" Subject: Re: MapReduce job with mixed data sources: HBase table and HDFS files To: "user@hbase.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1260837014-795785872-1373582697=:20402" X-Virus-Checked: Checked by ClamAV on apache.org ---1260837014-795785872-1373582697=:20402 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable Thanks very much for the help, Ted & Azurry. I wrote a very simple MR progr= am which takes HBase table as input and outputs to a HDFS file. Unfortunate= ly, I run into the following error:=0A=0Ajava.lang.ClassCastException: org.= apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.hbase.io.= ImmutableBytesWritable=0A=0AI run on pseudo-distributed hadoop (1.2.0) and = Pseudo-distributed HBase (0.95.1-hadoop1).=0A=0AHere is the complete source= code: an interesting thing is: if I comment out the multipleinputs line "M= ultipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, TableMap= .class);", the MR job runs fine. =0A=0Apublic class MixMR {=0A=0A=A0=A0=A0 = public static class TableMap extends TableMapper=A0 {=0A=A0=A0= =A0=A0=A0=A0=A0 public static final byte[] CF =3D "cf".getBytes();=0A=A0=A0= =A0=A0=A0=A0=A0 public static final byte[] ATTR1 =3D "c1".getBytes();=0A=0A= =A0=A0=A0=A0=A0=A0=A0 public void map(ImmutableBytesWritable row, Result va= lue, Context context) throws IOException, InterruptedException {=0A=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0 =0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 String key= =3D Bytes.toString(row.get());=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 String = val =3D new String(value.getValue(CF, ATTR1));=0A=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0 =0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 context.write(new Text(key),= new Text(val));=0A=A0=A0=A0=A0=A0=A0=A0 }=0A=A0=A0=A0 }=0A=0A=0A=A0=A0=A0 = public static class Reduce extends Reducer=A0 = {=0A=A0=A0=A0=A0=A0=A0=A0 public void reduce(Object key, Iterable val= ues, Context context) =0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 thro= ws IOException, InterruptedException {=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 = String ks =3D key.toString();=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 for (Text= val : values){=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 context.wri= te(new Text(ks), val);=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 }=0A=0A=A0=A0=A0= =A0=A0=A0=A0 }=0A=A0=A0=A0 }=0A=0A=A0public static void main(String[] args)= throws Exception {=0A=A0=A0=A0=A0=A0=A0=A0 Path inputPath1 =3D new Path(ar= gs[0]);=0A=A0=A0=A0=A0=A0=A0=A0 Path outputPath =3D new Path(args[1]);=0A= =A0=A0=A0=A0=A0=A0=A0 =0A=A0=A0=A0=A0=A0=A0=A0 String tableName1 =3D "test"= ;=0A=A0=A0=A0=A0=A0=A0=A0 =0A=A0=A0=A0=A0=A0=A0=A0 Configuration config =3D= HBaseConfiguration.create();=0A=A0=A0=A0=A0=A0=A0=A0 Job job =3D new Job(c= onfig, "ExampleRead");=0A=A0=A0=A0=A0=A0=A0=A0 job.setJarByClass(MixMR.clas= s);=A0=A0=A0=A0 // class that contains mapper=0A=A0 =0A=A0=A0=A0=A0=A0=A0= =A0 =0A=A0=A0=A0=A0=A0=A0=A0 Scan scan =3D new Scan();=0A=A0=A0=A0=A0=A0=A0= =A0 scan.setCaching(500);=A0=A0=A0=A0=A0=A0=A0 // 1 is the default in Scan,= which will be bad for MapReduce jobs=0A=A0=A0=A0=A0=A0=A0=A0 scan.setCache= Blocks(false);=A0 // don't set to true for MR jobs=0A=A0=A0=A0=A0=A0=A0=A0 = scan.addFamily(Bytes.toBytes("cf"));=0A=A0=A0=A0=A0=A0=A0=A0 =0A=A0=A0=A0= =A0=A0=A0=A0 TableMapReduceUtil.initTableMapperJob(=0A=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 tableName1,=A0=A0=A0=A0=A0=A0=A0 // input HBase ta= ble name=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 scan,=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 // Scan instance to control CF and attribute= selection=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 TableMap.c= lass,=A0=A0 // mapper=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0= Text.class,=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 // mapper output key=0A=A0= =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Text.class,=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0 // mapper output value=0A=A0=A0=A0=A0=A0=A0=A0=A0=A0= =A0=A0=A0=A0=A0=A0=A0=A0 job);=0A=A0=A0=A0=A0=A0=A0=A0 job.setReducerClass(= Reduce.class);=A0=A0=A0 // reducer class=0A=A0=A0=A0=A0=A0=A0=A0 job.setOut= putFormatClass(TextOutputFormat.class);=A0=A0 =0A=0A=A0=A0=A0=A0=A0=A0=A0 /= / inputPath1 here has no effect for HBase table=0A=A0=A0=A0=A0=A0=A0=A0 Mul= tipleInputs.addInputPath(job, inputPath1, TextInputFormat.class, TableMap.c= lass);=0A=0A=A0=A0=A0=A0=A0=A0=A0 FileOutputFormat.setOutputPath(job, outpu= tPath); =0A=A0=A0=A0=A0=A0=A0=A0 =0A=A0=A0=A0=A0=A0=A0=A0 job.waitForComple= tion(true);=0A=A0=A0=A0 }=0A}=0A=0A=0A=0A=0A=0A____________________________= ____=0A From: Ted Yu =0ATo: user@hbase.apache.org; S. = Zhou =0ASent: Wednesday, July 10, 2013 11:21 AM=0ASubjec= t: Re: MapReduce job with mixed data sources: HBase table and HDFS files=0A= =0A=0A=A0 =A0 conf.set(TableInputFormat.SCAN, convertScanToString(scan));= =0A=0Ais called by initTableMapperJob().=0A=0ALooking at the source would m= ake it clear for you.=0A=0ACheers=0A=0AOn Wed, Jul 10, 2013 at 10:55 AM, S.= Zhou wrote:=0A=0A> Thanks Ted. I will try that. But at = this time I am not sure how to call "=0A> conf.set()" after call "initTable= MapperJob()"?=0A> The approach suggested by Azuryy is " conf.set(TableInput= Format.SCAN,=0A> TableMapReduceUtil.convertScanToString(new Scan()));"=0A>= =0A>=0A>=0A> ________________________________=0A>=A0 From: Ted Yu =0A> To: user@hbase.apache.org; S. Zhou =0A> = Sent: Wednesday, July 10, 2013 10:21 AM=0A> Subject: Re: MapReduce job with= mixed data sources: HBase table and HDFS=0A> files=0A>=0A>=0A> Can you uti= lize initTableMapperJob() (which=0A> calls TableMapReduceUtil.convertScanTo= String() underneath) ?=0A>=0A> On Wed, Jul 10, 2013 at 10:15 AM, S. Zhou wrote:=0A>=0A> > Hi Azuryy, I am testing the way you sugge= sted. Now I am facing a=0A> > compilation error for the following statement= :=0A> > conf.set(TableInputFormat.SCAN,=0A> TableMapReduceUtil.convertScanT= oString(new=0A> > Scan()));=0A> >=0A> >=0A> > The error is: "method convert= ScanToString is not visible in=0A> > TableMapReduceUtil". Could u help? It = blocks me.=0A> >=0A> >=0A> > BTW, I am using the HBase-server jar file vers= ion 0.95.1-hadoop1 . I=0A> tried=0A> > other versions as well like 0.94.9 a= nd got the same error.=0A> >=0A> > Thanks!=0A> >=0A> >=0A> > ______________= __________________=0A> >=A0 From: Azuryy Yu =0A> > To: = user@hbase.apache.org=0A> > Sent: Wednesday, July 3, 2013 6:02 PM=0A> > Sub= ject: Re: MapReduce job with mixed data sources: HBase table and HDFS=0A> >= files=0A> >=0A> >=0A> > Hi,=0A> > 1) It cannot input two different cluster= 's data to a MR job.=0A> > 2) If your data locates in the same cluster, the= n:=0A> >=0A> >=A0 =A0 conf.set(TableInputFormat.SCAN,=0A> > TableMapReduce= Util.convertScanToString(new Scan()));=0A> >=A0 =A0 conf.set(TableInputFor= mat.INPUT_TABLE, tableName);=0A> >=0A> >=A0 =A0 MultipleInputs.addInputPat= h(conf, new Path(input_on_hdfs),=0A> > TextInputFormat.class, MapperForHdfs= .class);=0A> >=A0 =A0 MultipleInputs.addInputPath(conf, new Path(input_on_= hbase),=0A> > TableInputFormat.class, MapperForHBase.class);*=0A> >=0A> > *= =0A> > but,=0A> > new Path(input_on_hbase) can be any path, it make no sens= e.*=0A> >=0A> > *=0A> > Please refer to=0A> > org.apache.hadoop.hbase.mapre= duce.IndexBuilder for how to read table in=0A> the=0A> > MR job under $HBAS= E_HOME/src/example*=0A> >=0A> >=0A> >=0A> > *=0A> >=0A> >=0A> > On Thu, Jul= 4, 2013 at 5:19 AM, Michael Segel > >wrote:= =0A> >=0A> > > You may want to pull your data from your HBase first in a se= parate map=0A> > > only job and then use its output along with other HDFS i= nput.=0A> > > There is a significant disparity between the reads from HDFS = and from=0A> > > HBase.=0A> > >=0A> > >=0A> > > On Jul 3, 2013, at 10:34 AM= , S. Zhou wrote:=0A> > >=0A> > > > Azuryy, I am looking = at the MultipleInputs doc. But I could not=0A> figure=0A> > > out how to ad= d HBase table as a Path to the input? Do you have some=0A> > sample=0A> > >= code? Thanks!=0A> > > >=0A> > > >=0A> > > >=0A> > > >=0A> > > > __________= ______________________=0A> > > > From: Azuryy Yu =0A> >= > > To: user@hbase.apache.org; S. Zhou =0A> > > > Sent: = Tuesday, July 2, 2013 10:06 PM=0A> > > > Subject: Re: MapReduce job with mi= xed data sources: HBase table and=0A> > HDFS=0A> > > files=0A> > > >=0A> > = > >=0A> > > > Hi ,=0A> > > >=0A> > > > Use MultipleInputs, which can solve = your problem.=0A> > > >=0A> > > >=0A> > > > On Wed, Jul 3, 2013 at 12:34 PM= , S. Zhou wrote:=0A> > > >=0A> > > >> Hi there,=0A> > > = >>=0A> > > >> I know how to create MapReduce job with HBase data source onl= y or=0A> HDFS=0A> > > >> file as data source. Now I need to create a MapRed= uce job with mixed=0A> > > data=0A> > > >> sources, that is, this MR job ne= ed to read data from both HBase and=0A> > HDFS=0A> > > >> files. Is it poss= ible? If yes, could u share some sample code?=0A> > > >>=0A> > > >> Thanks!= =0A> > > >> Senqiang=0A> > >=0A> > >=0A> >=0A> ---1260837014-795785872-1373582697=:20402--