hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Scheidtmann <jens.scheidtm...@gmail.com>
Subject Re: Running hadoop for processing sources in full sky maps
Date Sun, 21 Jul 2013 20:52:35 GMT
Dear Andrea,

you write:
a few tenth of sky maps, each 600 MB = some TB.
a few thousand sources, lat/long for each = let's say 10k = 1 MB.

Sounds to me, like you should turn your problem around:
- Distribute your sources.
- work on chunks of your sky maps files.
- search for sources fully covered by (node local) input
- if necesssary load some adjecant sky map files
- perform statistics.

Check http://en.wikipedia.org/wiki/Spatial_database#Spatial_index
 for a list of possible indexing strategies to prepare you sky maps to have
some locality properties: Use e.g. Z-order + padding to define areas to be
fed into your statistics algorithms.

You could also prepare input files, which repeat some information contained
in the sky maps, like overlapping borders around a central area, so that
eah input block has enough information to calculate your statistics for
each point contained in the central area.

If this preprocessing steps pays out depends on how often you're going to
run your statistics (e.g. on different points).

Hope this ignites your creativity,

Jens

Mime
View raw message