incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <>
Subject Re: [incubator-blur] The new Adhoc command is working though there are a few things hard coded that need to be pulled into the API. (753ab41)
Date Fri, 01 Aug 2014 23:24:07 GMT
How about this?

public abstract class Command<T1, T2> implements Serializable {

 public abstract void mergeFinal(Iterable<T2> results, BlurContext<T2>
context) throws IOException;
 public abstract void mergeLocal(Iterable<T1> results, BlurContext<T2>
context) throws IOException;
 public abstract void processIndex(BlurIndex blurIndex, BlurContext<T1>
context) throws IOException;


Where BlurContext<T> looks like:

public class BlurCommand<T1> implements Serializable {

 public void write(T1 object) throws IOException;
 public void progress();
 public void incCounter(String counter);
 public void setCounter(String counter, long num);

 public Object[] getArgs();
 public void setArgs(Object[] args);

Probably looks really familiar.. :)

 By providing the Iterable interface our implementation behind the scenes
could be running through each call to proccessIndex, that way we don't have
to realize the full List<T1> like the current implementation does.  Its a
step in the right direction, now real memory usage is contained within the
Command as opposed to message passing.  Its not total streaming but we have
removed one complete copy of intermediate results from ram.

 I also like the BlurContext idea more and more, we might not know all the
things we want to expose as hooks (blockcache, tmp disk access,
blurConfig??) up front but this gives us an api compatible way to extend
that without junking the core interface.

 The one last thing was while talking with Aaron he mentioned maybe
separating what the shardserver does from the controller server.  And this
is because it might give us more freedom to intergrate with other bulk
processing/streaming engines which ideally will hit the shards directly and
not pull data back via the controllers.
​  I'm not sure how that would look yet, its hard to get out of the mindset
that shards and controllers look the same api wise.

Anyways, hopefully this will spawn more ideas! ​

On Thu, Jul 31, 2014 at 1:30 PM, Tim Williams <> wrote:

> On Thu, Jul 31, 2014 at 12:55 PM, Aaron McCurry <>
> wrote:
> > We could do that, however we likely would need a way to have the
> > implementation create a initial return object so that a merge could be
> > incremental.
> >
> > For example:
> >
> > Let's say that we are aggregating counts and we have a custom Counts
> object
> > and we are going to merge each Result as it finishes.
> >
> > public Counts merge(Counts existing, Result result) {
> >   Counts mergedCounts= new Counts();
> >   // Do some counting and merging of existing Counts.
> >   return mergedCounts;
> > }
> >
> > So we could do one of three things.  We could allow existing to be null
> if
> > it's the first merge call or we could have a second method that doesn't
> > take an existing argument.
> >
> > public Counts inital(Result result) {
> > ...
> > }
> >
> > The last option I see is to use vargs like:
> >
> > public Counts merge(Result result, Counts... existing) {
> >   Counts mergedCounts= new Counts();
> >   // Do some counting and merging of existing Counts.
> >   return mergedCounts;
> > }
> >
> > This is at least a little cleaner in that it's implied that existing
> could
> > be absent or null as well as allowing multiple items to be merged are the
> > same time.
> >
> > What do you think?
> Yeah, they feel kinda awkward... what about having the command hold
> it's state internally, then merge(Result r) is asking to merge r onto
> itself?
> Thanks,
> --tim

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message