orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley" <owen.omal...@gmail.com>
Subject Re: Orc writer - continous memory flush out
Date Mon, 25 Sep 2017 17:36:50 GMT
ORC has to buffer the entire stripe in memory, so that can write the data
in column order rather than row order. If you have large blobs that you
can't buffer, I'd suggest writing them to a side file and storing the
offsets and lengths in the ORC file. That way you can write the large blobs
without spending all of your memory caching them (on either read or write).

.. Owen

On Mon, Aug 21, 2017 at 6:44 AM, Ozsvath, Tamas (GE Corporate, consultant) <
tamas.ozsvath@ge.com> wrote:

> Dear Apache users,
>
> We are willing to create orc files with org.apache.orc.Writer. Our test
> were okay, till we the orc file creation from a database table which
> contained blob-s. We have tried to change the following settings but
> neither of them was helpful:
>
>
>
> org.apache.orc.OrcFile.WriterOptions:
>
> bufferSize()
>
> stripeSize()
>
> blockSize()
>
> enforceBufferSize()
>
>
>
> Is there a way to continously populate the ORC file(flushing out from
> memory continously), instead of flushing out data  from memory up on
> closing the file writer? What is the best practice to create an orc file
> from datasource which contains blobs, and can’t be handled only in-memory?
>
>
>
> Any information is appreciated!
>
>
>
> Thanks,
> Tamas
>
>
>

Mime
View raw message