beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2716) AvroReader should refuse dynamic splits while in the last block
Date Wed, 02 Aug 2017 20:31:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111671#comment-16111671
] 

Eugene Kirpichov commented on BEAM-2716:
----------------------------------------

I think this affects only Avro, other file-based formats either don't support dynamic splits,
or don't know block lengths upfront.

> AvroReader should refuse dynamic splits while in the last block
> ---------------------------------------------------------------
>
>                 Key: BEAM-2716
>                 URL: https://issues.apache.org/jira/browse/BEAM-2716
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>            Priority: Minor
>
> AvroReader is able to detect when it's in the last block:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroSource.java#L728
> It could also use this information to avoid wastefully producing dynamic splits starting
in the range of the current block.
> One way to do this would be to have OffsetRangeTracker have a "claim range" operation:
claim range of [a, b) is, in terms of correctness, equivalent to claiming "a" (it checks whether
"a" is within the range), but sets the last claimed position to "b" rather than "a", thus
protecting more positions from being split away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message