incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron McCurry (JIRA)" <>
Subject [jira] [Commented] (BLUR-18) Rework the MapReduce Library to implement Input/OutputFromats
Date Sun, 14 Oct 2012 00:02:03 GMT


Aaron McCurry commented on BLUR-18:

1. The the new Thrift API is incomplete, but assuming we add in a few methods from 0.1 API
such as shardLayout(tableName) which return a map of shards to servers (shard servers serve
0 or more shards per table).  So with that information, one input split could correspond to
one shard.  For example if there is a table with 1000 shards being served on 100 machines
there would 1000 splits.  So really you would only need to care about number of shards per

2. No, a shard server serves potentially many shards of a given table  And if more servers
are added, or some of the servers fail the indexes will logically move toa new server.

3. My first thought is the session gets created in the MR driver program and executes the
query against one of the controller servers.  Then the splits read the results directly from
the shard servers.

4. Yes, but I'm going to change it back to Record.  There was enough confusion on my project
when I was chatting with people to realize that it was a bad name.  :)

5. At this point both are being created, the idea is that the BlurTuple service is to be used
by external clients.  So simplicity/ease of use is the driver for this API, the BlurShard
is to be used by internal code such as the controllers and the MR system.  In the past the
shard server ans controller servers presented the same API, but now since we in a state of
change I'm not sure if that's necessary going forward.  And for that matter I'm totally sold
on keeping the internal API thrift based.  It would probably be easier to provided a more
MR friendly API to the MR programs.
> Rework the MapReduce Library to implement Input/OutputFromats
> -------------------------------------------------------------
>                 Key: BLUR-18
>                 URL:
>             Project: Apache Blur
>          Issue Type: Improvement
>            Reporter: Aaron McCurry
> Currently the only way to implement indexing is to use the BlurReducer.  A better way
to implement this would be to support Hadoop input/outputformats in both the new and old api's.
 This would allow an easier integration with other Hadoop projects such as Hive and Pig.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message