hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <stutiawas...@hcl.com>
Subject RE: MR - Input from Hbase output to HDFS
Date Tue, 15 Nov 2011 04:59:34 GMT
Sure Doug,
Thanks

-----Original Message-----
From: Doug Meil [mailto:doug.meil@explorysmedical.com] 
Sent: Monday, November 14, 2011 9:08 PM
To: user@hbase.apache.org
Subject: Re: MR - Input from Hbase output to HDFS


Glad to worked through that and everything is working.  I will add an example of MR to Hbase-to-HDFS
in the book.





On 11/14/11 1:24 AM, "Stuti Awasthi" <stutiawasthi@hcl.com> wrote:

>Hi,
>I think that issue is with Filesystem Configuration, as in config, it 
>is picking HbaseConfiguration. When I modified my output directory path 
>to absolute path of HDFS :
>FileOutputFormat.setOutputPath(job, new 
>Path("hdfs://master:54310/MR/stuti3"));
>
>The MR jobs runs successfully and I am able to see stuti3 directory 
>inside HDFS at desired path.
>
>
>-----Original Message-----
>From: Stuti Awasthi
>Sent: Monday, November 14, 2011 11:40 AM
>To: user@hbase.apache.org
>Subject: RE: MR - Input from Hbase output to HDFS
>
>Hi Joey,
>Thanks for pointing this. After importing "FileOutputFormat" as you 
>suggested, I am able to run MR job from eclipse (Windows) the only 
>problem is I am not able to see the output directory this code is 
>creating. HDFS and HBase are on Linux machine.
>
>Code :
>		Configuration config = HBaseConfiguration.create();
>		config.set("hbase.zookeeper.quorum", "master");
>		config.set("hbase.zookeeper.property.clientPort", "2181");
>			
>		Job job = new Job(config, "Hbase_Read_Write");
>		job.setJarByClass(ReadWriteDriver.class);
>		Scan scan = new Scan();
>		scan.setCaching(500);
>		scan.setCacheBlocks(false);
>		TableMapReduceUtil.initTableMapperJob("users",
>scan,ReadWriteMapper.class, Text.class, IntWritable.class, job);
>		job.setOutputFormatClass(TextOutputFormat.class);
>		FileOutputFormat.setOutputPath(job, new Path("/stuti2"));
>
>After executing this code, the MR jobs runs successfully but when I 
>look hdfs no directory is created "/stuti2". I also looked directory in 
>local filesystem of Linux machine as well as windows machine, but not 
>able to find the output folder anywhere.
>	
>Eclipse console Output :
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:java.version=1.6.0_27
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.vendor=Sun Microsystems Inc.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.home=C:\Program Files\Java\jdk1.6.0_27\jre
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.class.path=D:\workspace\Hbase\MRHbaseReadWrite\bin;D:\
>wor 
>kspace\Hbase\MRHbaseReadWrite\lib\commons-cli-1.2.jar;D:\workspace\Hbas
>e\M 
>RHbaseReadWrite\lib\commons-httpclient-3.0.1.jar;D:\workspace\Hbase\MRH
>bas 
>eReadWrite\lib\commons-logging-1.0.4.jar;D:\workspace\Hbase\MRHbaseRead
>Wri 
>te\lib\hadoop-0.20.2-core.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\h
>bas 
>e-0.90.3.jar;D:\workspace\Hbase\MRHbaseReadWrite\lib\log4j-1.2.15.jar;D
>:\w orkspace\Hbase\MRHbaseReadWrite\lib\zookeeper-3.3.2.jar
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.library.path=C:\Program
>Files\Java\jdk1.6.0_27\jre\bin;C:\Windows\Sun\Java\bin;C:\Windows\syste
>m32 ;C:\Windows;C:/Program Files/Java/jre6/bin/client;C:/Program
>Files/Java/jre6/bin;C:/Program
>Files/Java/jre6/lib/i386;C:\Windows\system32;C:\Windows;C:\Windows\Syst
>em3 2\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Program
>Files\Java\jdk1.6.0_27;C:\Program
>Files\TortoiseSVN\bin;C:\cygwin\bin;D:\apache-maven-3.0.3\bin;D:\eclips
>e;;
>.
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.io.tmpdir=C:\Users\STUTIA~1\AppData\Local\Temp\
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:java.compiler=<NA>
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:os.name=Windows 7
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:os.arch=x86
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client
>environment:os.version=6.1
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:user.name=stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:user.home=C:\Users\stutiawasthi
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Client 
>environment:user.dir=D:\workspace\Hbase\MRHbaseReadWrite
>11/11/14 11:21:45 INFO zookeeper.ZooKeeper: Initiating client 
>connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Opening socket connection 
>to server master/10.33.64.235:2181
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Socket connection 
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:45 INFO zookeeper.ClientCnxn: Session establishment 
>complete on server master/10.33.64.235:2181, sessionid = 
>0x33879243de00ec, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.JobClient: Running job: job_local_0001
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client 
>connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection 
>to server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection 
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment 
>complete on server master/10.33.64.235:2181, sessionid = 
>0x33879243de00ed, negotiated timeout = 180000
>11/11/14 11:21:46 INFO zookeeper.ZooKeeper: Initiating client 
>connection,
>connectString=master:2181 sessionTimeout=180000 watcher=hconnection
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Opening socket connection 
>to server master/10.33.64.235:2181
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Socket connection 
>established to master/10.33.64.235:2181, initiating session
>11/11/14 11:21:46 INFO zookeeper.ClientCnxn: Session establishment 
>complete on server master/10.33.64.235:2181, sessionid = 
>0x33879243de00ee, negotiated timeout = 180000
>11/11/14 11:21:46 INFO mapred.MapTask: io.sort.mb = 100
>11/11/14 11:21:46 INFO mapred.MapTask: data buffer = 79691776/99614720
>11/11/14 11:21:46 INFO mapred.MapTask: record buffer = 262144/327680 
>...............................................
>11/11/14 11:21:46 INFO mapred.MapTask: Finished spill 0
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_m_000000_0 is done. And is in the process of 
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task 
>'attempt_local_0001_m_000000_0' done.
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.Merger: Merging 1 sorted segments
>11/11/14 11:21:46 INFO mapred.Merger: Down to the last merge-pass, with 
>1 segments left of total size: 103 bytes
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner:
>Task:attempt_local_0001_r_000000_0 is done. And is in the process of 
>commiting
>11/11/14 11:21:46 INFO mapred.LocalJobRunner:
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task
>attempt_local_0001_r_000000_0 is allowed to commit now
>11/11/14 11:21:46 INFO output.FileOutputCommitter: Saved output of task 
>'attempt_local_0001_r_000000_0' to /stuti2
>11/11/14 11:21:46 INFO mapred.LocalJobRunner: reduce > reduce
>11/11/14 11:21:46 INFO mapred.TaskRunner: Task 
>'attempt_local_0001_r_000000_0' done.
>11/11/14 11:21:47 INFO mapred.JobClient:  map 100% reduce 100%
>11/11/14 11:21:47 INFO mapred.JobClient: Job complete: job_local_0001
>11/11/14 11:21:47 INFO mapred.JobClient: Counters: 12
>11/11/14 11:21:47 INFO mapred.JobClient:   FileSystemCounters
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_READ=40923
>11/11/14 11:21:47 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82343
>11/11/14 11:21:47 INFO mapred.JobClient:   Map-Reduce Framework
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input groups=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine output records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map input records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce shuffle bytes=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Spilled Records=10
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output bytes=91
>11/11/14 11:21:47 INFO mapred.JobClient:     Combine input records=0
>11/11/14 11:21:47 INFO mapred.JobClient:     Map output records=5
>11/11/14 11:21:47 INFO mapred.JobClient:     Reduce input records=5
>
>
>Please Suggest
>
>-----Original Message-----
>From: Joey Echeverria [mailto:joey@cloudera.com]
>Sent: Friday, November 11, 2011 10:38 PM
>To: user@hbase.apache.org
>Subject: Re: MR - Input from Hbase output to HDFS
>
>There are two APIs (old and new), and you appear to be mixing them.
>TableMapReduceUtil only works with the new API. The solution is to 
>import the new version of FileOutputFormat which takes a Job:
>
>
>import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
>
>-Joey
>
>On Fri, Nov 11, 2011 at 12:55 AM, Stuti Awasthi <stutiawasthi@hcl.com>
>wrote:
>> The method " setOutputPath (JobConf,Path)" take JobConf as a 
>>parameter not the Job object.
>> At least this is the error Im getting while compiling with Hadoop
>>0.20.2 jar with eclipse.
>>
>> FileOutputFormat.setOutputPath(conf, new Path("/output"));
>>
>> -----Original Message-----
>> From: Prashant Sharma [mailto:prashant.iiith@gmail.com]
>> Sent: Friday, November 11, 2011 11:20 AM
>> To: user@hbase.apache.org
>> Subject: Re: MR - Input from Hbase output to HDFS
>>
>> Hi stuti,
>> I was wondering why  you are not using job object to set output path 
>>like this.
>>
>> FileOutputFormat.setOutputPath(job, new Path("outputReadWrite") );
>>
>>
>> thanks
>>
>> On Fri, Nov 11, 2011 at 10:43 AM, Stuti Awasthi
>><stutiawasthi@hcl.com>wrote:
>>
>>> Hi Andrie,
>>> Well I am bit confused. When I use Jobconf , and associate with  
>>>JobClient to run the job then I get the error that "Input directory 
>>>is not set".
>>> Since I want my input to be taken by Hbase table which I already  
>>>configured with "TableMapReduceUtil.initTableMapperJob". I don't want  
>>>to set input directory via jobconf.
>>> How to mix these 2 so that I can get input from Hbase and write 
>>>ouput  to HDFS.
>>>
>>> Thanks
>>>
>>> -----Original Message-----
>>> From: Andrei Cojocaru [mailto:majormax@gmail.com]
>>> Sent: Thursday, November 10, 2011 7:09 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: MR - Input from Hbase output to HDFS
>>>
>>> Stuti,
>>>
>>> I don't see you associating JobConf with Job anywhere.
>>> -Andrei
>>>
>>> ::DISCLAIMER::
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> -------------------------------------------------
>>>
>>> The contents of this e-mail and any attachment(s) are confidential 
>>> and intended for the named recipient(s) only.
>>> It shall not attach any liability on the originator or HCL or its 
>>> affiliates. Any views or opinions presented in this email are solely 
>>> those of the author and may not necessarily reflect the opinions of 
>>> HCL or its affiliates.
>>> Any form of reproduction, dissemination, copying, disclosure, 
>>> modification, distribution and / or publication of this message 
>>> without the prior written consent of the author of this e-mail is 
>>> strictly prohibited. If you have received this email in error please 
>>> delete it and notify the sender immediately. Before opening any mail 
>>> and attachments please check them for viruses and defect.
>>>
>>>
>>> --------------------------------------------------------------------
>>> -
>>> -
>>> -------------------------------------------------
>>>
>>
>
>
>
>--
>Joseph Echeverria
>Cloudera, Inc.
>443.305.9434
>



Mime
View raw message