incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Garrett Barton <garrett.bar...@gmail.com>
Subject Re: [incubator-blur] The new Adhoc command is working though there are a few things hard coded that need to be pulled into the API. (753ab41)
Date Mon, 04 Aug 2014 19:52:50 GMT
So been messing with the MR style api a little more and I don't really like
it.  The difference of having multiple things running in the JVM vs
independent turns out to be a reasonable enough difference that introduces
a whole lot of 'well we could do...' talk.

So instead how about a middle ground.  I still like the concept of a
BlurContext as it gives us an entry point to bail out of code using the
existing AtomicBoolean jazz with the progress() method. We could provide a
few types of BlurContexts, one that hard timed out after a set time, one
that had a time limit per BlurIndex, and another that had a timeout for
inactivity (think rt write Commands with no data coming in). We also get
the counters, always nice, and I'd like to see us move to parameter
retrieval like MR does with the context.getConf().getxxx() style vs the
Object[].  Unless someone has a really good reason as to wanting to keep
Object arrays??

Thoughts?

I can update the wiki to give a cleaner example if anyone thinks thats a
good idea?

~Garrett


On Fri, Aug 1, 2014 at 7:24 PM, Garrett Barton <garrett.barton@gmail.com>
wrote:

> How about this?
>
> public abstract class Command<T1, T2> implements Serializable {
>
>  public abstract void mergeFinal(Iterable<T2> results, BlurContext<T2>
> context) throws IOException;
>  public abstract void mergeLocal(Iterable<T1> results, BlurContext<T2>
> context) throws IOException;
>  public abstract void processIndex(BlurIndex blurIndex, BlurContext<T1>
> context) throws IOException;
>
> }
>
> Where BlurContext<T> looks like:
>
> public class BlurCommand<T1> implements Serializable {
>
>  public void write(T1 object) throws IOException;
>  public void progress();
>  public void incCounter(String counter);
>  public void setCounter(String counter, long num);
>
>  public Object[] getArgs();
>  public void setArgs(Object[] args);
> }
>
>
> Probably looks really familiar.. :)
>
>  By providing the Iterable interface our implementation behind the scenes
> could be running through each call to proccessIndex, that way we don't have
> to realize the full List<T1> like the current implementation does.  Its a
> step in the right direction, now real memory usage is contained within the
> Command as opposed to message passing.  Its not total streaming but we have
> removed one complete copy of intermediate results from ram.
>
>  I also like the BlurContext idea more and more, we might not know all the
> things we want to expose as hooks (blockcache, tmp disk access,
> blurConfig??) up front but this gives us an api compatible way to extend
> that without junking the core interface.
>
>  The one last thing was while talking with Aaron he mentioned maybe
> separating what the shardserver does from the controller server.  And this
> is because it might give us more freedom to intergrate with other bulk
> processing/streaming engines which ideally will hit the shards directly and
> not pull data back via the controllers.
> ​  I'm not sure how that would look yet, its hard to get out of the
> mindset that shards and controllers look the same api wise.
>
> Anyways, hopefully this will spawn more ideas! ​
>
>
>
>
> On Thu, Jul 31, 2014 at 1:30 PM, Tim Williams <williamstw@gmail.com>
> wrote:
>
>> On Thu, Jul 31, 2014 at 12:55 PM, Aaron McCurry <amccurry@gmail.com>
>> wrote:
>> > We could do that, however we likely would need a way to have the
>> > implementation create a initial return object so that a merge could be
>> > incremental.
>> >
>> > For example:
>> >
>> > Let's say that we are aggregating counts and we have a custom Counts
>> object
>> > and we are going to merge each Result as it finishes.
>> >
>> > public Counts merge(Counts existing, Result result) {
>> >   Counts mergedCounts= new Counts();
>> >   // Do some counting and merging of existing Counts.
>> >   return mergedCounts;
>> > }
>> >
>> > So we could do one of three things.  We could allow existing to be null
>> if
>> > it's the first merge call or we could have a second method that doesn't
>> > take an existing argument.
>> >
>> > public Counts inital(Result result) {
>> > ...
>> > }
>> >
>> > The last option I see is to use vargs like:
>> >
>> > public Counts merge(Result result, Counts... existing) {
>> >   Counts mergedCounts= new Counts();
>> >   // Do some counting and merging of existing Counts.
>> >   return mergedCounts;
>> > }
>> >
>> > This is at least a little cleaner in that it's implied that existing
>> could
>> > be absent or null as well as allowing multiple items to be merged are
>> the
>> > same time.
>> >
>> > What do you think?
>>
>> Yeah, they feel kinda awkward... what about having the command hold
>> it's state internally, then merge(Result r) is asking to merge r onto
>> itself?
>>
>> Thanks,
>> --tim
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message