orc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiening Dai <xndai....@live.com>
Subject The Orc magic string
Date Fri, 14 Jun 2019 16:52:23 GMT
Hi all,

In Orc appending scenario, the append operation (including writing the additional data and
the new footer) needs to be atomic. Otherwise if it failed in between, the file tail would
be unrecognizable. Unfortunately not all file system can garantee atomic write. When failure
does happen, in order to recover the data before append, we would need to locate the previous
file footer by searching backward. And the only way to search for the footer is by looking
for the “ORC” magic string. But the current magic string only has three characters and
it’s likely the same string appears in user data which will result in parsing a wrong footer,
and the behavior is undefined.

So I am thinking that if we can change the magic string into some 16-byte UUID. This way we
can safely use it to locate the footer. The idea is very similar to the sync maker in Avro.

Thanks.
Mime
View raw message