incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: [incubator-blur] The new Adhoc command is working though there are a few things hard coded that need to be pulled into the API. (753ab41)
Date Tue, 05 Aug 2014 13:43:25 GMT
On Mon, Aug 4, 2014 at 3:52 PM, Garrett Barton <garrett.barton@gmail.com>
wrote:

> So been messing with the MR style api a little more and I don't really like
> it.  The difference of having multiple things running in the JVM vs
> independent turns out to be a reasonable enough difference that introduces
> a whole lot of 'well we could do...' talk.
>

I have been playing with the a MR api and I agree that I do not like it.
 Plus it would likely promote misuse.


>
> So instead how about a middle ground.  I still like the concept of a
> BlurContext as it gives us an entry point to bail out of code using the
> existing AtomicBoolean jazz with the progress() method. We could provide a
> few types of BlurContexts, one that hard timed out after a set time, one
> that had a time limit per BlurIndex, and another that had a timeout for
> inactivity (think rt write Commands with no data coming in). We also get
> the counters, always nice, and I'd like to see us move to parameter
> retrieval like MR does with the context.getConf().getxxx() style vs the
> Object[].  Unless someone has a really good reason as to wanting to keep
> Object arrays??
>

I don't like a config get model to pass argument because this is suppose to
a be low latency call.  I think that we could do a simple map of arguments
if that is needed.

I'm trying to come up with a higher level api that would allow for more
complex calls, but would also handle the process, merge, merge model.  I
will try and post to the wiki at some point today.

Aaron


> Thoughts?
>
> I can update the wiki to give a cleaner example if anyone thinks thats a
> good idea?
>
> ~Garrett
>
>
> On Fri, Aug 1, 2014 at 7:24 PM, Garrett Barton <garrett.barton@gmail.com>
> wrote:
>
> > How about this?
> >
> > public abstract class Command<T1, T2> implements Serializable {
> >
> >  public abstract void mergeFinal(Iterable<T2> results, BlurContext<T2>
> > context) throws IOException;
> >  public abstract void mergeLocal(Iterable<T1> results, BlurContext<T2>
> > context) throws IOException;
> >  public abstract void processIndex(BlurIndex blurIndex, BlurContext<T1>
> > context) throws IOException;
> >
> > }
> >
> > Where BlurContext<T> looks like:
> >
> > public class BlurCommand<T1> implements Serializable {
> >
> >  public void write(T1 object) throws IOException;
> >  public void progress();
> >  public void incCounter(String counter);
> >  public void setCounter(String counter, long num);
> >
> >  public Object[] getArgs();
> >  public void setArgs(Object[] args);
> > }
> >
> >
> > Probably looks really familiar.. :)
> >
> >  By providing the Iterable interface our implementation behind the scenes
> > could be running through each call to proccessIndex, that way we don't
> have
> > to realize the full List<T1> like the current implementation does.  Its a
> > step in the right direction, now real memory usage is contained within
> the
> > Command as opposed to message passing.  Its not total streaming but we
> have
> > removed one complete copy of intermediate results from ram.
> >
> >  I also like the BlurContext idea more and more, we might not know all
> the
> > things we want to expose as hooks (blockcache, tmp disk access,
> > blurConfig??) up front but this gives us an api compatible way to extend
> > that without junking the core interface.
> >
> >  The one last thing was while talking with Aaron he mentioned maybe
> > separating what the shardserver does from the controller server.  And
> this
> > is because it might give us more freedom to intergrate with other bulk
> > processing/streaming engines which ideally will hit the shards directly
> and
> > not pull data back via the controllers.
> > ​  I'm not sure how that would look yet, its hard to get out of the
> > mindset that shards and controllers look the same api wise.
> >
> > Anyways, hopefully this will spawn more ideas! ​
> >
> >
> >
> >
> > On Thu, Jul 31, 2014 at 1:30 PM, Tim Williams <williamstw@gmail.com>
> > wrote:
> >
> >> On Thu, Jul 31, 2014 at 12:55 PM, Aaron McCurry <amccurry@gmail.com>
> >> wrote:
> >> > We could do that, however we likely would need a way to have the
> >> > implementation create a initial return object so that a merge could be
> >> > incremental.
> >> >
> >> > For example:
> >> >
> >> > Let's say that we are aggregating counts and we have a custom Counts
> >> object
> >> > and we are going to merge each Result as it finishes.
> >> >
> >> > public Counts merge(Counts existing, Result result) {
> >> >   Counts mergedCounts= new Counts();
> >> >   // Do some counting and merging of existing Counts.
> >> >   return mergedCounts;
> >> > }
> >> >
> >> > So we could do one of three things.  We could allow existing to be
> null
> >> if
> >> > it's the first merge call or we could have a second method that
> doesn't
> >> > take an existing argument.
> >> >
> >> > public Counts inital(Result result) {
> >> > ...
> >> > }
> >> >
> >> > The last option I see is to use vargs like:
> >> >
> >> > public Counts merge(Result result, Counts... existing) {
> >> >   Counts mergedCounts= new Counts();
> >> >   // Do some counting and merging of existing Counts.
> >> >   return mergedCounts;
> >> > }
> >> >
> >> > This is at least a little cleaner in that it's implied that existing
> >> could
> >> > be absent or null as well as allowing multiple items to be merged are
> >> the
> >> > same time.
> >> >
> >> > What do you think?
> >>
> >> Yeah, they feel kinda awkward... what about having the command hold
> >> it's state internally, then merge(Result r) is asking to merge r onto
> >> itself?
> >>
> >> Thanks,
> >> --tim
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message