hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject What is the best way to locate the offset and length of all fields in a Hadoop sequential text file?
Date Fri, 22 Jan 2016 05:30:18 GMT
The given sequential files correspond to an external Hive table.

They are stored in
/tableName/part-00000
/tableName/part-00001
...

There are about 2000 attributes in the table. Now I want to process the
data using Hadoop streaming and mapReduce. The first step is to find the
offset and length for each attribute.

What is the best way to get this information?

Mime
View raw message