hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Skippin those gost darn 0 byte diles
Date Tue, 22 Jul 2014 22:14:56 GMT
Currently using:

    <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.3.0</version>
        </dependency>


I have this piece of code that does.

writer = SequenceFile.createWriter(fs, conf, p, Text.class, Text.class,
CompressionType.BLOCK, codec);

Then I have a piece of code like this...

  public static final long SYNC_EVERY_LINES = 1000;
 if (meta.getLinesWritten() % SYNC_EVERY_LINES == 0){
        meta.getWriter().sync();
      }


And I commonly see:

[ecapriolo@staging-hadoop-cdh-67-14 ~]$ hadoop dfs -ls
/user/beacon/2014072117
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Found 12 items
-rw-r--r--   3 service-igor supergroup    1065682 2014-07-21 17:50
/user/beacon/2014072117/0bb6cd71-70ac-405a-a8b7-b8caf9af8da1
-rw-r--r--   3 service-igor supergroup    1029041 2014-07-21 17:40
/user/beacon/2014072117/1b0ef6b3-bd51-4100-9d4b-1cecdd565f93
-rw-r--r--   3 service-igor supergroup    1002096 2014-07-21 17:10
/user/beacon/2014072117/34e2acb4-2054-44df-bbf7-a4ce7f1e5d1b
-rw-r--r--   3 service-igor supergroup    1028450 2014-07-21 17:30
/user/beacon/2014072117/41c7aa62-d27f-4d53-bed8-df2fb5803c92
-rw-r--r--   3 service-igor supergroup          0 2014-07-21 17:50
/user/beacon/2014072117/5450f246-7623-4bbd-8c97-8176a0c30351
-rw-r--r--   3 service-igor supergroup    1084873 2014-07-21 17:30
/user/beacon/2014072117/8b36fbca-6f5b-48a3-be3c-6df6254c3db2
-rw-r--r--   3 service-igor supergroup    1043108 2014-07-21 17:20
/user/beacon/2014072117/949da11a-247b-4992-b13a-5e6ce7e51e9b
-rw-r--r--   3 service-igor supergroup     986866 2014-07-21 17:10
/user/beacon/2014072117/979bba76-4d2e-423f-92f6-031bc41f6fbd
-rw-r--r--   3 service-igor supergroup          0 2014-07-21 17:50
/user/beacon/2014072117/b76db189-054f-4dac-84a4-a65f39a6c1a9
-rw-r--r--   3 service-igor supergroup    1040931 2014-07-21 17:50
/user/beacon/2014072117/bba6a677-226c-4982-8fb2-4b136108baf1
-rw-r--r--   3 service-igor supergroup    1012137 2014-07-21 17:40
/user/beacon/2014072117/be940202-f085-45bb-ac84-51ece2e1ba47
-rw-r--r--   3 service-igor supergroup    1028467 2014-07-21 17:20
/user/beacon/2014072117/c336e0c8-76e7-40e7-98e2-9f529f25577b

Sometimes even though they show as 0 bytes you can read data from them.
Sometimes it blows up with a stack trace I have lost.


On Tue, Jul 22, 2014 at 5:45 PM, Bertrand Dechoux <dechouxb@gmail.com>
wrote:

> I looked at the source by curiosity, for the latest version (2.4), the
> header is flushed during the writer creation. Of course, key/value classes
> are provided. By 0-bytes, you really mean even without the header? Or 0
> bytes of payload?
>
>
> On Tue, Jul 22, 2014 at 11:05 PM, Bertrand Dechoux <dechouxb@gmail.com>
> wrote:
>
>> The header is expected to have the full name of the key class and value
>> class so if it is only detected with the first record (?) indeed the file
>> can not respect its own format.
>>
>> I haven't tried it but LazyOutputFormat should solve your problem.
>>
>> https://hadoop.apache.org/docs/current/api/index.html?org/apache/hadoop/mapred/lib/LazyOutputFormat.html
>>
>> Regards
>>
>> Bertrand Dechoux
>>
>>
>> Bertrand Dechoux
>>
>>
>> On Tue, Jul 22, 2014 at 10:39 PM, Edward Capriolo <edlinuxguru@gmail.com>
>> wrote:
>>
>>> I have two processes. One that writes sequence files directly to hdfs,
>>> the other that is a hive table that reads these files.
>>>
>>> All works well with the exception that I am only flushing the files
>>> periodically. SequenceFile input format gets angry when it encounters
>>> 0-bytes seq files.
>>>
>>> I was considering flush and sync on first record write. Also was
>>> thinking should just be able to hack sequence file input format to skip 0
>>> byte files and not throw exception on readFully() which it sometimes does.
>>>
>>> Anyone ever tackled this?
>>>
>>
>>
>

Mime
View raw message