hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: writing to multiple hbase tables in a mapreduce job
Date Tue, 26 Aug 2014 18:53:20 GMT
You don't need to initialize the tables.

You just need to specify the output format as MultipleTableOutputFormat
class.

Something like this:
job.setOutputFormatClass(MultipleTableOutputFormat.class);


Because if you see the code for MultipleTableOutputFormat, it creates the
table on the fly and stores it in the internal map when you call
context.write.
When context.write is called:

 @Override <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/Override.java#Override>

126 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#126>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

    public void
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>write(ImmutableBytesWritable
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java#ImmutableBytesWritable>
tableName, Writable action) throws IOException
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/io/IOException.java#IOException>
{

127 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#127>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

      HTable <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/client/HTable.java#HTable>
table = getTable
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.getTable%28org.apache.hadoop.hbase.io.ImmutableBytesWritable%29>(tableName);



Which calls getTable() shown below which cr

eates the table on the fly and stores it in the internal map :



 HTable <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/client/HTable.java#HTable>
 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>getTable(ImmutableBytesWritable
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java#ImmutableBytesWritable>
tableName) throws IOException
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/io/IOException.java#IOException>
{

99 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#99>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

       if (!tables
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.0tables>.containsKey
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/util/Map.java#Map.containsKey%28java.lang.Object%29>(tableName))
{

100 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#100>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

        LOG <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.0LOG>.debug
<http://grepcode.com/file/repo1.maven.org/maven2/commons-logging/commons-logging/1.1.1/org/apache/commons/logging/Log.java#Log.debug%28java.lang.Object%29>("Opening
HTable \"" + Bytes.toString
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/util/Bytes.java#Bytes.toString%28byte%5B%5D%29>(tableName.get
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java#ImmutableBytesWritable.get%28%29>())+
"\" for writing");

101 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#101>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

        HTable <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/client/HTable.java#HTable>
table = new HTable
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/client/HTable.java#HTable>(conf
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.0conf>,
tableName.get <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/io/ImmutableBytesWritable.java#ImmutableBytesWritable.get%28%29>());

102 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#102>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

        table.setAutoFlush
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/client/HTable.java#HTable.setAutoFlush%28boolean%29>(false);

103 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#103>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

        tables <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.0tables>.put
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/util/Map.java#Map.put%28org.apache.hadoop.hbase.io.ImmutableBytesWritable%2Corg.apache.hadoop.hbase.client.HTable%29>(tableName,
table);

104 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#104>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

      }

105 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#105>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

      return tables
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.0tables>.get
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/util/Map.java#Map.get%28java.lang.Object%29>(tableName);

106 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#106>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

    }

107 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#107>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

108 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#108>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

    @Override <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/lang/Override.java#Override>

109 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#109>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

    public void
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>close(TaskAttemptContext
context) throws IOException
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/io/IOException.java#IOException>
{

110 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#110>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

      for (HTable
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/client/HTable.java#HTable>
table : tables <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.0tables>.values
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b27/java/util/Map.java#Map.values%28%29>())
{

111 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#111>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

        table.flushCommits
<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/client/HTable.java#HTable.flushCommits%28%29>();

112 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#112>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

      }

113 <http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#113>

<http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#>

    }


In fact, I would suggest to go through this code here for the whole class:

http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.1/org/apache/hadoop/hbase/mapreduce/MultiTableOutputFormat.java#MultiTableOutputFormat.MultiTableRecordWriter.getTable%28org.apache.hadoop.hbase.io.ImmutableBytesWritable%29



It is different from TableOutputFormat approach where you do need to
intialize the table by using the Util class.



Regards,

Shahab



On Tue, Aug 26, 2014 at 2:29 PM, yeshwanth kumar <yeshwanth43@gmail.com>
wrote:

> hi ted,
>
> i need to process the data in table i1, and then i need to write the
> results to tables i1 and i2
> so input for the mapper in my mapreduce job is from hbase table, i1
> whereas in WALPlayer input is HLogInputFormat,
>
> if i remove the statement as you said and specify  the inputformat
> as TableInputFormat it is throwing "No table was provided " Exception
> if i specify the input table as in the statements
>
> TableMapReduceUtil.initTableMapperJob(otherArgs[0], scan,
> EntitySearcherMapper.class, ImmutableBytesWritable.class, Put.class,
> job);//otherArgs[0]=i1
>
> mapper is not considering other table,
> any suggestions to resolve  this issue,
>
> thanks,
> yeshwanth
>
>
>
>
> On Tue, Aug 26, 2014 at 10:39 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Please take a look at WALPlayer.java in hbase where you can find example
> of
> > how MultiTableOutputFormat is used.
> >
> > Cheers
> >
> >
> > On Tue, Aug 26, 2014 at 10:04 AM, yeshwanth kumar <yeshwanth43@gmail.com
> >
> > wrote:
> >
> > > hi ted,
> > >
> > > how can we intialise the mapper if i comment out those lines
> > >
> > >
> > >
> > > On Tue, Aug 26, 2014 at 10:08 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > TableMapReduceUtil.initTableMapperJob(otherArgs[0], scan,
> > > > EntitySearcherMapper.class, ImmutableBytesWritable.class, Put.class,
> > > > job);//otherArgs[0]=i1
> > > >
> > > > You're initializing with table 'i1'
> > > > Please remove the above call and try again.
> > > >
> > > > Cheers
> > > >
> > > >
> > > > On Tue, Aug 26, 2014 at 9:18 AM, yeshwanth kumar <
> > yeshwanth43@gmail.com>
> > > > wrote:
> > > >
> > > > > hi i am running  HBase 0.94.20  on Hadoop 2.2.0
> > > > >
> > > > > i am using MultiTableOutputFormat,
> > > > > for writing processed output to two different tables in hbase.
> > > > >
> > > > > here's the code snippet
> > > > >
> > > > > private ImmutableBytesWritable tab_cr = new ImmutableBytesWritable(
> > > > > Bytes.toBytes("i1")); private ImmutableBytesWritable tab_cvs = new
> > > > > ImmutableBytesWritable( Bytes.toBytes("i2"));
> > > > >
> > > > > @Override
> > > > > public void map(ImmutableBytesWritable row, final Result value,
> > > > > final Context context) throws IOException, InterruptedException {
> > > > >
> > > > > -----------------------------------------
> > > > > Put pcvs = new Put(entry.getKey().getBytes());
> > > > > pcvs.add("cf".getBytes(),"type".getBytes(),column.getBytes());
> > > > > Put put = new Put(value.getRow());
> > > > > put.add("Entity".getBytes(), "json".getBytes(),
> > > > > entry.getValue().getBytes());
> > > > > context.write(tab_cr, put);// table i1 context.write(tab_cvs,
> > > > pcvs);//table
> > > > > i2
> > > > >
> > > > > }
> > > > >
> > > > > job.setJarByClass(EntitySearcherMR.class);
> > > > > job.setMapperClass(EntitySearcherMapper.class);
> > > > > job.setOutputFormatClass(MultiTableOutputFormat.class); Scan scan
=
> > new
> > > > > Scan(); scan.setCacheBlocks(false);
> > > > > TableMapReduceUtil.initTableMapperJob(otherArgs[0], scan,
> > > > > EntitySearcherMapper.class, ImmutableBytesWritable.class,
> Put.class,
> > > > > job);//otherArgs[0]=i1
> > > > TableMapReduceUtil.initTableReducerJob(otherArgs[0],
> > > > > null, job); job.setNumReduceTasks(0);
> > > > >
> > > > > mapreduce job fails by saying nosuchcolumnfamily "cf" exception,
in
> > > table
> > > > > i1
> > > > > i am writing data to two different columnfamilies one in each
> table,
> > cf
> > > > > belongs to table i2.
> > > > > does the columnfamilies should present in both tables??
> > > > > is there anything i am missing
> > > > > can someone point me in the right direction
> > > > >
> > > > > thanks,
> > > > > yeshwanth.
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message