hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject 0.92 release
Date Mon, 25 Jul 2011 21:30:45 GMT
http://s.apache.org/x4 has grown to 40 issues.

We should clean up the above list so that coprocessors can be used by more
people.

I suggest moving HBASE-4060 out of 0.92 release.

On Mon, Jul 25, 2011 at 2:26 PM, Gary Helmling <ghelmling@gmail.com> wrote:

> Unfortunately there's no easy patch set to pull coprocessors into any 0.90
> HBase version (including CDH3 HBase).  The changes are extensive and
> invasive and include RPC protocol changes.  Internally at Trend Micro we
> run
> a heavily, heavily patched 0.90-based version of HBase that includes
> coprocessors and security.  But that is only possible with a lot of effort
> to keep things up to date with the HBase 0.90 development.
>
> At one point we had made a 0.90-coprocessor branch available, but it's
> simply too much work to keep it up to date.  It's in everyone's best
> interests if we instead focus on getting out a 0.92 release that includes
> coprocessors.
>
> HBase trunk (and by extension 0.92) of course supports running on CDH3, so
> you should have no problem plugging in the new version once HBase 0.92 is
> out.
>
> --gh
>
>
> On Mon, Jul 25, 2011 at 1:23 PM, Paul Nickerson <
> paul.nickerson@escapemg.com
> > wrote:
>
> > We currently run on the cloudera stack. Would this be something that we
> can
> > pull, compile, and plug right into that stack?
> >
> > ----- Original Message -----
> >
> > From: "Gary Helmling" <ghelmling@gmail.com>
> > To: user@hbase.apache.org
> > Sent: Monday, July 25, 2011 2:02:50 PM
> > Subject: Re: Fanning out hbase queries in parallel
> >
> > Coprocessors are currently only in trunk. They will be in the 0.92
> release
> > once we get that out. There's no set date for that, but personally I'll
> be
> > trying to help get it out sooner than later.
> >
> >
> > On Mon, Jul 25, 2011 at 7:37 AM, Michel Segel <michael_segel@hotmail.com
> > >wrote:
> >
> > > Which release(s) have coprocessors enabled?
> > >
> > > Sent from a remote device. Please excuse any typos...
> > >
> > > Mike Segel
> > >
> > > On Jul 24, 2011, at 11:03 PM, Sonal Goyal <sonalgoyal4@gmail.com>
> wrote:
> > >
> > > > Hi Paul,
> > > >
> > > > Have you taken a look at HBase coprocessors? I think you will find
> them
> > > > useful.
> > > >
> > > > Best Regards,
> > > > Sonal
> > > > <https://github.com/sonalgoyal/hiho>Hadoop ETL and Data
> > > > Integration<https://github.com/sonalgoyal/hiho>
> > > > Nube Technologies <http://www.nubetech.co>
> > > >
> > > > <http://in.linkedin.com/in/sonalgoyal>
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Jul 25, 2011 at 8:13 AM, Paul Nickerson <
> > > paul.nickerson@escapemg.com
> > > >> wrote:
> > > >
> > > >>
> > > >> I would like to implement a multidimensional query system that
> > > aggregates
> > > >> large amounts of data on-the-fly by fanning out queries in parallel.
> > It
> > > >> should be fast enough for interactive exploration of the data and
> > > extensible
> > > >> enough to take sets of hundreds or thousands of dimensions with high
> > > >> cardinality, and aggregate them from high granularity to low
> > > granularity.
> > > >> Dimensions and their values are stored in the row key. For instance,
> > row
> > > >> keys look like this
> > > >> Foo=bar,blah=123
> > > >> and each row contains numerical values within their column families,
> > > such
> > > >> as plays=100, versioned by the date of calculation.
> > > >> User wants the top "Foo" values with blah=123 sorted downward by
> total
> > > >> plays in july. My current thinking is that a query would get
> executed
> > by
> > > >> grouping all Foo-prefixed row keys by region server, and send the
> > query
> > > to
> > > >> each of those. Each region server iterates through all of it's row
> > keys
> > > that
> > > >> start with Foo=something,blah=, and passes the query on to all
> regions
> > > >> containing blahs that equal 123, which then contain play counts.
> > > Matching
> > > >> row keys, as well as the sum of all their play values within july,
> are
> > > >> passed back up the chain and sorted/truncated when possible.
> > > >>
> > > >>
> > > >> It seems quite complicated and would involve either modifying hbase
> > > source
> > > >> code or at the very least using the deep internals of the api. Does
> > this
> > > >> seem like a practical solution or could someone offer some ideas?
> > > >>
> > > >>
> > > >> Thank you!
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message