hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiao Li <xell...@aim.com>
Subject OPENFORWRITE Files issue
Date Thu, 13 Feb 2014 01:26:07 GMT

Say I have a text file on hdfs in "OPENFORWRITE, HEALTHY" status. some process is appending
to it. 

It has 4 lines in it.

hadoop fs -cat /file | wc -l 

However when I do a wordcount on this file, only first line is visible to the mapreduce. Similar
in hive when i do "select count(*) from filetable" = 1

If I do "hadoop cp /file /file2", then it works as expected.(file2 is closed, file is still

wordcount would see 5 lines in the input directory(1 from opened file, 4 from copied file),
hive will return 5.

I am wondering if there is anything related to TextInputFormat?

I am using CDH 4.4.0


Xiao Li

View raw message