apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <singh.chan...@gmail.com>
Subject Re: Block reading and data locality
Date Mon, 09 May 2016 21:58:40 GMT
Hi Pramod,

I thought about this and IMO one way to achieve a little more efficiently
 is by providing some support from the platform and intelligent
partitioning in BlockReader.

1.  Platform support: A partition be able to express on which node it
should be created. Application master then requests RM to deploy the
partition on that node.

2. Initially just one instance of Block Reader is created. When it receives
BlockMetadata, it can derive where the new hdfs blocks are. So it can
create more Partitions if there isn't a BlockReader on that node already
running.

I will like to take it up if there is some consensus to support this.

Chandni

On Mon, May 9, 2016 at 2:56 PM, Sandesh Hegde <sandesh@datatorrent.com>
wrote:

> So the requirement is to mix runtime and deployment decisions.
> How about allowing the operators to request re-deployment based on the
> runtime condition?
>
>
> On Mon, May 9, 2016 at 2:33 PM Pramod Immaneni <pramod@datatorrent.com>
> wrote:
>
> > The file splitter, block reader combination allows for parallel reading
> of
> > files by multiple partitions by dividing the files into blocks. Does
> anyone
> > have any ideas on how to have the block readers be data local to the
> blocks
> > they are reading.
> >
> > I think we will need to spawn block readers on all nodes where the block
> > are present and if the readers are reading multiple files this could mean
> > all the nodes in the cluster and route the block meta information to the
> > appropriate block reader.
> >
> > Thanks
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message