hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: Loading file to HDFS with custom chunk structure
Date Wed, 16 Jan 2013 15:43:27 GMT
Look at  the block size concept in Hadoop and see if that is what you are looking for 

Sent from my iPhone

On Jan 16, 2013, at 7:31 AM, Kaliyug Antagonist <kaliyugantagonist@gmail.com> wrote:

> I want to load a SegY file onto HDFS of a 3-node Apache Hadoop cluster.
> To summarize, the SegY file consists of :
> 3200 bytes textual header
> 400 bytes binary header
> Variable bytes data
> The 99.99% size of the file is due to the variable bytes data which is collection of
thousands of contiguous traces. For any SegY file to make sense, it must have the textual
header+binary header+at least one trace of data. What I want to achieve is to split a large
SegY file onto the Hadoop cluster so that a smaller SegY file is available on each node for
local processing.
> The scenario is as follows:
> The SegY file is large in size(above 10GB) and is resting on the local file system of
the NameNode machine
> The file is to be split on the nodes in such a way each node has a small SegY file with
a strict structure - 3200 bytes textual header + 400 bytes binary header + variable bytes
dataAs obvious, I can't blindly use FSDataOutputStream or hadoop fs -copyFromLocal as this
may not ensure the format in which the chunks of the larger file are required
> Please guide me as to how I must proceed.
> Thanks and regards !

View raw message