hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars George (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-1626) Allow emitting Deletes out of new TableReducer
Date Wed, 08 Jul 2009 13:37:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-1626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12728695#action_12728695

Lars George commented on HBASE-1626:

Seems like there are two issues buried here, one is to be able to "generalize" the class that
is handed into the reduce phase. The other is how to access a table. For the latter - correct
me if I am wrong Doğacan - you seem to have tackled the wrong end of the stick. Instead of
extending TableReducer and make use of a table in the IdentityTableReducer you leave that
as is and simply add a custom TableReducer that creates the the table in the "setup()" method,
does the put's etc. in the "reduce()" call and closes/flushes in the "cleanup()" method.

In other words you do not need to do anything but create a simple job that uses IdentityTableReducer
together with TableOutputFormat - which takes care of the table.put(). As long as I do not
miss anything else that is pretty much what you are doing. Use the TableMapReduceUtil class
to set up the job and also the name of the table etc.

The crucial part is abstracting the type of the class the reducer actually receives, so instead
of assuming a Put it should be a Delete as well if possible. I think Stack has that down 100%
in his patch. So his patch together with using the above classes you are fine. 

Question for Stack
+      if (value instanceof Put) this.table.put(new Put((Put)value));
+      else if (value instanceof Delete) this.table.delete(new Delete((Delete)value));

why doing that and not 

+      if (value instanceof Put) this.table.put((Put) value);
+      else if (value instanceof Delete) this.table.delete((Delete) value);

Just wondering if there is a reason to create a new object. Are the cached in the framework
and the object reference causes them to be modified before written? They are already written
to an intermediate during the map/reduce cross over so they are already copies. 

> Allow emitting Deletes out of new TableReducer
> ----------------------------------------------
>                 Key: HBASE-1626
>                 URL: https://issues.apache.org/jira/browse/HBASE-1626
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Lars George
>             Fix For: 0.20.0
>         Attachments: deletes.patch, table-reduce.patch
> Doğacan Güney (nutch) wants to emit Delete from TableReduce.  Currently we only do

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message