hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <>
Subject Re: Hive orc use case
Date Mon, 26 Sep 2016 16:41:38 GMT
ORC does not store data row by row.  It decomposes the rows into columns, and then stores pointer
to those columns, as well as a number of indices and statistics, in a footer of the file.
 Due to the footer, in the simple case you cannot read the file before you close it or append
to it.  We did address both of these issues to support Hive streaming, but it’s a low level
interface.  If you want to take a look at how Hive streaming handles this you could use it
as your guide.  The starting point for that is HiveEndPoint in org.apache.hive.hcatalog.streaming.


> On Sep 26, 2016, at 01:18, Amey Barve <> wrote:
> Hi All,
> I have an use case where I need to append either 1 or many rows to orcFile as well as
read 1 or many rows from it.
> I observed that I cannot read rows from OrcFile unless I close the OrcFile's writer,
is this correct?
> Why doesn't write actually flush the rows to the orcFile, is there any alternative where
I write the rows as well as read them without closing the orcFile's writer ?
> Thanks and Regards,
> Amey 

View raw message