hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hrishikesh Agashe <hrishikesh_aga...@persistent.co.in>
Subject DFS block size
Date Sat, 14 Nov 2009 16:25:06 GMT

Default DFS block size is 64 MB. Does this mean that if I put file less than 64 MB on HDFS,
it will not be divided any further?
I have lots and lots if XMLs and I would like to process them directly. Currently I am converting
them to Sequence files (10 XMLs per sequence file) and the putting them on HDFS. However creating
sequence files is very time consuming process. So if I just ensure that all XMLs are less
than 64 MB (or value of dfs.block.size), they will not be split and I can safely process them
in map / reduce using SAX parser?

If this is not possible, is there a way to speed up sequence file creation process?

This e-mail may contain privileged and confidential information which is the property of Persistent
Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed.
If you are not the intended recipient, you are not authorized to read, retain, copy, print,
distribute or use this message. If you have received this communication in error, please notify
the sender and delete all copies of this message. Persistent Systems Ltd. does not accept
any liability for virus infected mails.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message