hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: HBase => replication => Hive
Date Fri, 11 Mar 2011 19:51:33 GMT
See Lars George's response.

What I hear is full table scans without taking advantage of any HBase features for predicate
push down or blooms etc. is slower. I can buy that. And say don't do it that way.

Isn't the best way to go is first look at the underlying cause of the slowdown? I don't have
much insight into that, so don't know the probability of getting improvement. But it seems
the level of effort for doing some kind of continuous export via replication would at least
be as high as digging in there for a bit.

  - Andy

> From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> Subject: Re: HBase => replication => Hive
> To: user@hbase.apache.org
> Date: Friday, March 11, 2011, 11:13 AM
> Hi,
> 
> 
> ----- Original Message ----
> 
> > From: Andrew Purtell <apurtell@apache.org>
> > 
> > Pardon, I'm not as familiar with this area as I
> should, but
> > 
> > >  apparently Hive queries run about x5
> > > slower than queries that go against  normal
> Hive tables.
> > 
> > Is this not a reasonable place to start? Why is 
> this?
> 
> Reasonable?  I don't know. :)  That's really the
> first thing I was hoping to 
> find out.  J-Ds reaction makes it sound like this is
> not unreasonable.
> 
> > > I was wondering if people think it would be
> possible  to
> > > implement HBase=>Hive replication? 
> > 
> > This strikes me as non  trivial. If doing this
> level of effort, why not look 
> >into the Hive/HBase  integration? Maybe there is
> something HBase can do to make 
> >it  faster?
> 
> 
> At this point I don't know how trivial or non-trivial it is
> yet.  But I thought 
> that if John Sichi, who strikes me as a pretty smart
> fellow, says he's seeing x5 
> performance loss and he's the one who worked on the
> integration, getting from 5 
> to 4 or lower may be non-trivial.  HBase => Hive is
> terra incognita so, who 
> knows, maybe it's easy to do. :)
> 
> Otis
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> > Best regards,
> > 
> >     - Andy
> > 
> > Problems worthy  of attack prove their worth by
> hitting back.
> >   - Piet Hein (via Tom  White)
> > 
> > 
> > --- On Thu, 3/10/11, Otis Gospodnetic <otis_gospodnetic@yahoo.com> 
> wrote:
> > 
> > > From: Otis Gospodnetic <otis_gospodnetic@yahoo.com>
> > >  Subject: HBase => replication =>
> Hive
> > > To: user@hbase.apache.org
> > > Date:  Thursday, March 10, 2011, 10:43 PM
> > > Hi,
> > > 
> > > Since HBase has  a mechanism to replicate
> edit logs to
> > > another HBase cluster, I was  wondering if
> people think it
> > > would be possible to implement 
> HBase=>Hive 
> > > replication? (and really make the
> destination  pluggable
> > > later on)
> > > 
> > > I'm asking because while one can  integrate
> Hive and HBase
> > > by creating external tables in Hive that 
> actually point to
> > > tables in HBase, apparently Hive queries run
> about  x5
> > > slower than queries that go against normal Hive
> tables.
> > > 
> > > And because all HBase export options are for 1
> table at a
> > > time  and not point in time snapshots of the
> whole table,
> > > exporting data from  HBase and importing
> into Hive doesn't
> > > sound like a viable  option.
> > > 
> > > Thanks,
> > > Otis
> > > ----
> > > Sematext :: http://sematext.com/ :: Solr - 
> Lucene - Hadoop
> > > Hadoop ecosystem search :: http://search-hadoop.com/
> > > 
> > > 
> > 
> > 
> >       
> > 
> 


      

Mime
View raw message