arrow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Kornfield <emkornfi...@gmail.com>
Subject [Discuss][Format] Checksum/Hash signature for data
Date Wed, 06 Mar 2019 05:32:41 GMT
Hi Arrow Dev,
As we expand the use-cases for Arrow to move it more across system
boundaries (Flight) and make it live longer (e.g. in the file format), it
seems to make sense to build in a mechanism for data integrity verification
(e.g. a checksum like CRC32 or in some cases a cryptographic hash like
SHA1).

This can be done a backwards compatible manner for the actual data buffers
by adding metadata to the headers (this could be a use-case for custom
metadata but I would prefer to make it explicit).  However, to make sure we
have full coverage, we would need to augment the stream [1] to be something
like:

<metadata_size: int32>
<metadata_flatbuffer: bytes>
<signature_size: int16>
<metadata signature>
<padding>
<message body>

I don't think we should require implementations to actual use this
functionality but we should make it a possibility (signature size could be
zero meaning no checksum/hash is provided) and have it be standardized if
possible.

Thoughts?

Sorry if this has already been discussed but I could find anything from
searching JIRA or the mailing list archive, and it doesn't look like it is
in the format spec.

Thanks,
Micah

[1] https://arrow.apache.org/docs/ipc.html

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message