hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Tariq <donta...@gmail.com>
Subject Handling files with unclear boundaries
Date Mon, 06 Aug 2012 15:30:24 GMT
Hello list,

     I need some guidance on how to handle files where we don't have
any proper delimiters or record boundaries. Actually I am trying to
process a set of file that are totally alien to me (SAS XPT files)
through MR. But one thing that is always fixed is that each time I
have to read 107 bytes from the line. Is it possible to use this
length as a delimiter for creating splits some how??And if so which
InputFormat would be appropriate??Many thanks.

    Mohammad Tariq

View raw message