hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: about realtime map-reduce
Date Mon, 05 Oct 2009 17:08:35 GMT
1. Lets do a jruby filter!  It'd implement the Filter interface.  Once
launched, we'd keep up the interpreter instance around so performacne
shouldn't be too bad thereafter (Cache the compiled script?).   Serializing,
we'd pass jruby script between client and server.   Every time a jruby
filter runs, all lights in the building should flash warning!, warning!  You
could be hosing your cluster!
2. On a real-time maprduce, Jon, how do you think the mapping should work?
Would work on a region be a 'task'?  If the 'task' failed -- regionserver
crash, split, etc. -- would it be retried?

St.Ack

On Mon, Oct 5, 2009 at 6:37 AM, Jonathan Gray <jlist@streamy.com> wrote:

> The project is new and there isn't much more detail I can provide.
>
> We are currently working on HBASE-1845 and other threading-related issues
> which are the basis for doing much of this.
>
> The hard part is how to package it, how it manage it, how to integrate it,
> how to protect the server, how to handle fault scenarios, etc... I think
> the best progression will be to continue developing threading and
> filtering and see where it leads.  Much work remains to be done on
> fully-functional filters.
>
> More than anything our current implementation is a proof of concept.
> Simple stuff works and is pretty quick, trying some more complex
> multi-trip stuff now to see how far this can be taken.  When there's
> enough to say, we'll be talking about it.
>
> JG
>
> On Sun, October 4, 2009 9:47 pm, Ryan Rawson wrote:
> > I'm not sure that in a controlled environment, arbitrary code would be
> > all that bad. I guess ddosing your own regionserver would be bad, but
> > still.
> >
> > As for real time map reduce, that was a thing on Jonathan's slides,
> > and he mentioned it was a top secret fancy thing he was working on at
> > Streamy. No other details are available, unless he chooses to share
> > them.
> >
> > -ryan
> >
> >
> > On Sun, Oct 4, 2009 at 12:06 PM, Andrew Purtell <apurtell@apache.org>
> > wrote:
> >
> >> On a related note HBASE-1002 talks about generic user filters. But as
> >> you point out there are risks with untrusted code execution which have
> >> to be considered even for that restricted case.
> >>
> >> One thing that can be done with some confidence that one user or job
> >> won't DoS everyone else is to allow a fixed set of additive/aggregate
> >> function to run in a scanner context on the regionservers. This would
> >> avoid the need to send any of the data back to the client if the goal
> >> is counting, summation, averaging, etc. And these functions can be
> >> stacked such that a list of operations on columns are fed into a list
> >> of operations on the row.
> >>
> >> Allowing arbitrary code however is the way to madness. There could be
> >> an option to allow this through bytecode shipping but I do not think
> >> anyone should fool themselves into thinking this is at all safe to do
> >> in production. There is a middle ground of restricted code (e.g. no
> >> backwards branches or cyclical calling dependencies allowed) which is
> >> interesting from both usability and code safety perspectives. There are
> >> some research bytecode rewriting systems which could serve as a
> >> starting point.
> >>
> >>   - Andy
> >>
> >>
> >> __________________________________________________
> >> Do You Yahoo!?
> >> Tired of spam?  Yahoo! Mail has the best spam protection around
> >> http://mail.yahoo.com
> >>
> >>
> >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message