hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sofia Georgiakaki <geosofie_...@yahoo.com>
Subject Running queries using index on HDFS
Date Mon, 25 Jul 2011 21:40:47 GMT
Good evening,

I have built an Rtree on HDFS, in order to improve the query performance of high-selectivity
spatial queries.
The Rtree is composed of a number of hdfs files (each one created by one Reducer, so as the
number of the files is equal to the number of the reducers), where each file is a subtree
of the root of the Rtree.
I investigate the way to use the Rtree in an efficient way, with respect to the locality of
each file on hdfs (data-placement).

I would like to ask, if it is possible to read a file which is on hdfs, from a java application
(not MapReduce).
In case this is not possible (as I believe), either I should download the files on the local
filesystem (which is not a solution, since the files could be very large), orrun the queries
using the Hadoop.
In order to maximise the gain, I should probably process a batch of queries during each Job,
and run each query on a node that is "near" to the files that are involved in handling the
specific query.

Can I find the node where each file is located (or at least most of its blocks), and run on
that node a reducer that handles these queries? Could the function  DFSClient.getBlockLocations()
help ?

Thank you in advance,
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message