hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ondrej Holecek <ond...@holecek.eu>
Subject Re: mapreduce streaming with hbase as a source
Date Sat, 19 Feb 2011 16:03:31 GMT
I don't think you understand me correctly,

I get this line:

72 6f 77 31     keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
row1/family1:b/1298037744658/Put/vlen=1, row1/family1:c/1298037748020/Put/vlen=1}

I know "72 6f 77 31" is the key and the rest is value, let's call it
mapreduce-value. In this mapreduce-value there is
"row1/family1:a/1298037737154/Put/vlen=1" that is hbase-row name, hbase-column
name and hbase-timestamp.  But I expect also hbase-value.

So my question is what to do to make TableInputFormat to send also this hbase-value.


Ondrej


On 02/19/11 16:41, ShengChang Gu wrote:
> By default, the prefix of a line
> up to the first tab character is the key and the rest of the line
> (excluding the tab character)
> will be the value. If there is no tab character in the line, then entire
> line is considered as key
> and the value is null. However, this can be customized, Use:
>  
> -D stream.map.output.field.separator=.
> -D stream.num.map.output.key.fields=4
> 
> 2011/2/19 Ondrej Holecek <ondrej@holecek.eu <mailto:ondrej@holecek.eu>>
> 
>     Thank you, I've spend a lot of time with debuging but didn't notice
>     this typo :(
> 
>     Now it works, but I don't understand one thing: On stdin I get this:
> 
>     72 6f 77 31     keyvalues={row1/family1:a/1298037737154/Put/vlen=1,
>     row1/family1:b/1298037744658/Put/vlen=1,
>     row1/family1:c/1298037748020/Put/vlen=1}
>     72 6f 77 32     keyvalues={row2/family1:a/1298037755440/Put/vlen=2,
>     row2/family1:b/1298037758241/Put/vlen=2,
>     row2/family1:c/1298037761198/Put/vlen=2}
>     72 6f 77 33     keyvalues={row3/family1:a/1298037767127/Put/vlen=3,
>     row3/family1:b/1298037770111/Put/vlen=3,
>     row3/family1:c/1298037774954/Put/vlen=3}
> 
>     I see there is everything but value. What should I do to get value
>     on stdin too?
> 
>     Ondrej
> 
>     On 02/18/11 20:01, Jean-Daniel Cryans wrote:
>     > You have a typo, it's hbase.mapred.tablecolumns not
>     hbase.mapred.tablecolumn
>     >
>     > J-D
>     >
>     > On Fri, Feb 18, 2011 at 6:05 AM, Ondrej Holecek <ondrej@holecek.eu
>     <mailto:ondrej@holecek.eu>> wrote:
>     >> Hello,
>     >>
>     >> I'm testing hadoop and hbase, I can run mapreduce streaming or
>     pipes jobs agains text files on
>     >> hadoop, but I have a problem when I try to run the same job
>     against hbase table.
>     >>
>     >> The table looks like this:
>     >> hbase(main):015:0> scan 'table1'
>     >> ROW                                                COLUMN+CELL
>     >>
>     >>  row1                                            
>      column=family1:a, timestamp=1298037737154,
>     >> value=1
>     >>
>     >>  row1                                            
>      column=family1:b, timestamp=1298037744658,
>     >> value=2
>     >>
>     >>  row1                                            
>      column=family1:c, timestamp=1298037748020,
>     >> value=3
>     >>
>     >>  row2                                            
>      column=family1:a, timestamp=1298037755440,
>     >> value=11
>     >>
>     >>  row2                                            
>      column=family1:b, timestamp=1298037758241,
>     >> value=22
>     >>
>     >>  row2                                            
>      column=family1:c, timestamp=1298037761198,
>     >> value=33
>     >>
>     >>  row3                                            
>      column=family1:a, timestamp=1298037767127,
>     >> value=111
>     >>
>     >>  row3                                            
>      column=family1:b, timestamp=1298037770111,
>     >> value=222
>     >>
>     >>  row3                                            
>      column=family1:c, timestamp=1298037774954,
>     >> value=333
>     >>
>     >> 3 row(s) in 0.0240 seconds
>     >>
>     >>
>     >> And command I use, with the exception I get:
>     >>
>     >> # hadoop jar
>     /usr/lib/hadoop/contrib/streaming/hadoop-streaming-0.20.2+737.jar -D
>     >> hbase.mapred.tablecolumn=family1:  -input table1 -output
>     /mtestout45 -mapper test-map
>     >> -numReduceTasks 1 -reducer test-reduce -inputformat
>     org.apache.hadoop.hbase.mapred.TableInputFormat
>     >>
>     >> packageJobJar:
>     [/var/lib/hadoop/cache/root/hadoop-unjar8960137205806573426/] []
>     >> /tmp/streamjob8218197708173702571.jar tmpDir=null
>     >> 11/02/18 14:45:48 INFO mapred.JobClient: Cleaning up the staging area
>     >>
>     hdfs://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035
>     <http://oho-nnm.dev.chservices.cz/var/lib/hadoop/cache/mapred/mapred/staging/root/.staging/job_201102151449_0035>
>     >> Exception in thread "main" java.lang.RuntimeException: Error in
>     configuring object
>     >>        at
>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>     >>        at
>     org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>     >>        at
>     org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>     >>        at
>     org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:597)
>     >>        at
>     org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:926)
>     >>        at
>     org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:918)
>     >>        at
>     org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
>     >>        at
>     org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:834)
>     >>        at
>     org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:793)
>     >>        at java.security.AccessController.doPrivileged(Native Method)
>     >>        at javax.security.auth.Subject.doAs(Subject.java:396)
>     >>        at
>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>     >>        at
>     org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:793)
>     >>        at
>     org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:767)
>     >>        at
>     org.apache.hadoop.streaming.StreamJob.submitAndMonitorJob(StreamJob.java:922)
>     >>        at
>     org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:123)
>     >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     >>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     >>        at
>     org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
>     >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     >>        at
>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     >>        at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     >>        at java.lang.reflect.Method.invoke(Method.java:597)
>     >>        at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>     >> Caused by: java.lang.reflect.InvocationTargetException
>     >>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     >>        at
>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     >>        at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     >>        at java.lang.reflect.Method.invoke(Method.java:597)
>     >>        at
>     org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>     >>        ... 23 more
>     >> Caused by: java.lang.NullPointerException
>     >>        at
>     org.apache.hadoop.hbase.mapred.TableInputFormat.configure(TableInputFormat.java:51)
>     >>        ... 28 more
>     >>
>     >>
>     >> Can anyone tell me what I am doing wrong?
>     >>
>     >> Regards,
>     >> Ondrej
>     >>
> 
> 
> 
> 
> -- 
> 阿昌


Mime
View raw message