hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Writing MR-Job: Something like OracleReducer, JDBCReducer ...
Date Fri, 16 Sep 2011 18:43:11 GMT

Sonal,

You do realize that HBase is a "database", right? ;-)

So again, why do you need a reducer?  ;-)

Using your example...
"Again, there will be many cases where one may want a reducer, say trying to count the occurrence
of words in a particular column."

You can do this one of two ways...
1) Dynamic Counters in Hadoop.
2) Use a temp table and auto increment the value in a column which contains the word count.
 (Fat row where rowkey is doc_id and column is word or rowkey is doc_id|word)

I'm sorry but if you go through all of your examples of why you would want to use a reducer,
you end up finding out that writing to an HBase table would be faster than a reduce job.
(Again we haven't done an exhaustive search, but in all of the HBase jobs we've run... no
reducers were necessary.)

The point I'm trying to make is that you want to avoid using a reducer whenever possible and
if you think about your problem... you can probably come up with a solution that avoids the
reducer...


HTH

-Mike
PS. I haven't looked at *all* of the potential use cases of HBase which is why I don't want
to say you'll never need a reducer. I will say that based on what we've done at my client's
site, we try very hard to avoid reducers.
[Note, I'm sure I'm going to get hammered on this when I head to NY in Nov. :-)   ]

 
> Date: Fri, 16 Sep 2011 23:00:49 +0530
> Subject: Re: Writing MR-Job: Something like OracleReducer, JDBCReducer ...
> From: sonalgoyal4@gmail.com
> To: user@hbase.apache.org
> 
> Hi Michael,
> 
> Yes, thanks, I understand the fact that reducers can be expensive with all
> the shuffling and the sorting, and you may not need them always. At the same
> time, there are many cases where reducers are useful, like secondary
> sorting. In many cases, one can have multiple map phases and not have a
> reduce phase at all. Again, there will be many cases where one may want a
> reducer, say trying to count the occurrence of words in a particular column.
> 
> 
> With this thought chain, I do not feel ready to say that when dealing with
> HBase, I really dont want to use a reducer. Please correct me if I am
> wrong.
> 
> Thanks again.
> 
> Best Regards,
> Sonal
> Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> Nube Technologies <http://www.nubetech.co>
> 
> <http://in.linkedin.com/in/sonalgoyal>
> 
> 
> 
> 
> 
> On Fri, Sep 16, 2011 at 10:35 PM, Michael Segel
> <michael_segel@hotmail.com>wrote:
> 
> >
> > Sonal,
> >
> > Just because you have a m/r job doesn't mean that you need to reduce
> > anything. You can have a job that contains only a mapper.
> > Or your job runner can have a series of map jobs in serial.
> >
> > Most if not all of the map/reduce jobs where we pull data from HBase, don't
> > require a reducer.
> >
> > To give you a simple example... if I want to determine the table schema
> > where I am storing some sort of structured data...
> > I just write a m/r job which opens a table, scan's the table counting the
> > occurrence of each column name via dynamic counters.
> >
> > There is no need for a reducer.
> >
> > Does that help?
> >
> >
> > > Date: Fri, 16 Sep 2011 21:41:01 +0530
> > > Subject: Re: Writing MR-Job: Something like OracleReducer, JDBCReducer
> > ...
> > > From: sonalgoyal4@gmail.com
> > > To: user@hbase.apache.org
> > >
> > > Michel,
> > >
> > > Sorry can you please help me understand what you mean when you say that
> > when
> > > dealing with HBase, you really dont want to use a reducer? Here, Hbase is
> > > being used as the input to the MR job.
> > >
> > > Thanks
> > > Sonal
> > >
> > >
> > > On Fri, Sep 16, 2011 at 2:35 PM, Michel Segel <michael_segel@hotmail.com
> > >wrote:
> > >
> > > > I think you need to get a little bit more information.
> > > > Reducers are expensive.
> > > > When Thomas says that he is aggregating data, what exactly does he
> > mean?
> > > > When dealing w HBase, you really don't want to use a reducer.
> > > >
> > > > You may want to run two map jobs and it could be that just dumping the
> > > > output via jdbc makes the most sense.
> > > >
> > > > We are starting to see a lot of questions where the OP isn't providing
> > > > enough information so that the recommendation could be wrong...
> > > >
> > > >
> > > > Sent from a remote device. Please excuse any typos...
> > > >
> > > > Mike Segel
> > > >
> > > > On Sep 16, 2011, at 2:22 AM, Sonal Goyal <sonalgoyal4@gmail.com>
> > wrote:
> > > >
> > > > > There is a DBOutputFormat class in the
> > org.apache,hadoop.mapreduce.lib.db
> > > > > package, you could use that. Or you could write to the hdfs and then
> > use
> > > > > something like HIHO[1] to export to the db. I have been working
> > > > extensively
> > > > > in this area, you can write to me directly if you need any help.
> > > > >
> > > > > 1. https://github.com/sonalgoyal/hiho
> > > > >
> > > > > Best Regards,
> > > > > Sonal
> > > > > Crux: Reporting for HBase <https://github.com/sonalgoyal/crux>
> > > > > Nube Technologies <http://www.nubetech.co>
> > > > >
> > > > > <http://in.linkedin.com/in/sonalgoyal>
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Sep 16, 2011 at 10:55 AM, Steinmaurer Thomas <
> > > > > Thomas.Steinmaurer@scch.at> wrote:
> > > > >
> > > > >> Hello,
> > > > >>
> > > > >>
> > > > >>
> > > > >> writing a MR-Job to process HBase data and store aggregated data
in
> > > > >> Oracle. How would you do that in a MR-job?
> > > > >>
> > > > >>
> > > > >>
> > > > >> Currently, for test purposes we write the result into a HBase
table
> > > > >> again by using a TableReducer. Is there something like a
> > OracleReducer,
> > > > >> RelationalReducer, JDBCReducer or whatever? Or should one simply
use
> > > > >> plan JDBC code in the reduce step?
> > > > >>
> > > > >>
> > > > >>
> > > > >> Thanks!
> > > > >>
> > > > >>
> > > > >>
> > > > >> Thomas
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > >
> >
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message