hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Re: Hadoop and Hibernate
Date Fri, 02 Mar 2012 18:59:23 GMT
Queries are nothing but inserts.  Create an object, populated it, persist
it. If it worked, life would be good right now.

I've considered JDBC and may yet take that approach.

re: Hibernate outside of Spring -- I'm getting tired already.

Interesting thing:  I use EMF (Eclipse Modelling Framework).  The
supporting jar files for emf and ecore are built into the job.  They are
being found by the Driver(s) and the MR(s) no problemo.  If these work, why
not the hibernate stuff?  Mystery!

On 2 March 2012 10:50, Tarjei Huse <tarjei@scanmine.com> wrote:

> On 03/02/2012 07:31 PM, Geoffry Roberts wrote:
> > No, I am using 0.21.0 for better performance.  I am interested in
> > DistributedCache so certain libraries can be found during MR processing.
> > As it is now, I'm getting ClassNotFoundException being thrown by the
> > Reducers.  The Driver throws no error, the Reducer(s) does.  It would
> seem
> > something is not being distributed across the cluster as I assumed it
> > would.  After all, the whole business is in a single, executable jar
> file.
>
> How complex are the queries you are doing?
>
> Have you considered one of the following:
>
> 1) Use plain jdbc instead of integrating Hibernate into Hadoop.
> 2) Create a local version of the db that can be in the Distributed Cache.
>
> I tried using Hibernate with hadoop (the queries were not an important
> part of the size of the jobs) but I ran up against so many issues trying
> to get Hibernate to start up within the MR job that i ended up just
> exporting the tables, loading them into memory and doing queries against
> them with basic HashMap lookups.
>
> My best advice is that if you can, you should consider a way to abstract
> away Hibernate from the job and use something closer to the metal like
> either JDBC or just dump the data to files. Getting Hibernate to run
> outside of Spring and friends can quickly grow tiresome.
>
> T
> >
> > On 2 March 2012 09:46, Kunaal <kunaalbhasin@gmail.com> wrote:
> >
> >> Are you looking to use DistributedCache for better performance?
> >>
> >> On Fri, Mar 2, 2012 at 9:42 AM, Geoffry Roberts
> >> <geoffry.roberts@gmail.com>wrote:
> >>
> >>> This is a tardy response.  I'm spread pretty thinly right now.
> >>>
> >>> DistributedCache<
> >>>
> >>
> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache
> >>>> is
> >>> apparently deprecated.  Is there a replacement?  I didn't see anything
> >>> about this in the documentation, but then I am still using 0.21.0. I
> have
> >>> to for performance reasons.  1.0.1 is too slow and the client won't
> have
> >>> it.
> >>>
> >>> Also, the DistributedCache<
> >>>
> >>
> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache
> >>>> approach
> >>> seems only to work from within a hadoop job.  i.e. From within a
> >>> Mapper or a Reducer, but not from within a Driver.  I have libraries
> >> that I
> >>> must access both from both places.  I take it that I am stuck keeping
> two
> >>> copies of these libraries in synch--Correct?  It's either that, or copy
> >>> them into hdfs, replacing them all at the beginning of each job run.
> >>>
> >>> Looking for best practices.
> >>>
> >>> Thanks
> >>>
> >>> On 28 February 2012 10:17, Owen O'Malley <omalley@apache.org> wrote:
> >>>
> >>>> On Tue, Feb 28, 2012 at 5:15 PM, Geoffry Roberts
> >>>> <geoffry.roberts@gmail.com> wrote:
> >>>>
> >>>>> If I create an executable jar file that contains all dependencies
> >>>> required
> >>>>> by the MR job do all said dependencies get distributed to all nodes?
> >>>> You can make a single jar and that will be distributed to all of the
> >>>> machines that run the task, but it is better in most cases to use the
> >>>> distributed cache.
> >>>>
> >>>> See
> >>>>
> >>
> http://hadoop.apache.org/common/docs/r1.0.0/mapred_tutorial.html#DistributedCache
> >>>>> If I specify but one reducer, which node in the cluster will the
> >>> reducer
> >>>>> run on?
> >>>> The scheduling is done by the JobTracker and it isn't possible to
> >>>> control the location of the reducers.
> >>>>
> >>>> -- Owen
> >>>>
> >>>
> >>>
> >>> --
> >>> Geoffry Roberts
> >>>
> >>
> >>
> >> --
> >> "What we are is the universe's gift to us.
> >> What we become is our gift to the universe."
> >>
> >
> >
>
>
> --
> Regards / Med vennlig hilsen
> Tarjei Huse
> Mobil: 920 63 413
>
>


-- 
Geoffry Roberts

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message