avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AVRO-1393) SyncInterval logic always causes blocks to be larger than the sync interval
Date Tue, 05 Nov 2013 01:02:17 GMT
Hari Shreedharan created AVRO-1393:
--------------------------------------

             Summary: SyncInterval logic always causes blocks to be larger than the sync interval
                 Key: AVRO-1393
                 URL: https://issues.apache.org/jira/browse/AVRO-1393
             Project: Avro
          Issue Type: Bug
            Reporter: Hari Shreedharan


If sync interval in the container file is set to be exactly block size, then the sync marker
will be slightly larger than the block as we check the size of the file only after writing
data to the stream. This means that sync interval is essentially the smallest interval between
sync markers. 

Since we cannot predict the serialized size of the datum, we can never know how much data
will overflow the block. Whatever the case, this might be more expensive than expected especially
on systems like HDFS.

Fixing this is difficult without breaking a bunch of interfaces, so opening this jira for
discussion with people with more knowledge of the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message