avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Shreedharan (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (AVRO-1393) SyncInterval logic always causes blocks to be larger than the sync interval
Date Wed, 06 Nov 2013 20:27:18 GMT

     [ https://issues.apache.org/jira/browse/AVRO-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hari Shreedharan resolved AVRO-1393.
------------------------------------

    Resolution: Not A Problem

> SyncInterval logic always causes blocks to be larger than the sync interval
> ---------------------------------------------------------------------------
>
>                 Key: AVRO-1393
>                 URL: https://issues.apache.org/jira/browse/AVRO-1393
>             Project: Avro
>          Issue Type: Bug
>            Reporter: Hari Shreedharan
>
> If sync interval in the container file is set to be exactly block size, then the sync
marker will be slightly larger than the block as we check the size of the file only after
writing data to the stream. This means that sync interval is essentially the smallest interval
between sync markers. 
> Since we cannot predict the serialized size of the datum, we can never know how much
data will overflow the block. Whatever the case, this might be more expensive than expected
especially on systems like HDFS.
> Fixing this is difficult without breaking a bunch of interfaces, so opening this jira
for discussion with people with more knowledge of the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message