drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 熊贻青 <xiong.jag...@gmail.com>
Subject Data locality with regard to hdfs
Date Sun, 09 Mar 2014 05:39:41 GMT
I have been familiar myself with the drill codebase. But would need some
advice on how drillbits are chosen for execution. Given a specific case,
I'd like to scan a file on Hdfs, and the result will be merged to get a
simple sum from a integer column. As you can see, the file is already
spread across the cluster. Two points could be noted:
1.Some affinity might be calculated (statically) wrt block placement if the
drillbits are running at same nodes as hdfs data nodes.
2. When partial result s are ready from all drillbit, we need to transfer
some of them to one single drillbit, we need some parameters(dynamic) as
input. The process could become more complicated if the intermediate
results are merged in stages.

I can't find the place where decisions for above cases are made, so any
pointer in the source or document would help!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message