arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirmala S <nanna.tech.st...@gmail.com>
Subject Re: Caching layer using arrow
Date Wed, 27 Mar 2019 07:06:09 GMT
Now I see there is a ORC adaptor for Arrow which can read ORC file as a table. With this in
place, I intend to use TableBatchReader to read it. 

How to get a single record from TableBatchReader ? 


> On 22-Mar-2019, at 12:18 AM, Wes McKinney <wesmckinn@gmail.com> wrote:
> 
> hi Nirmala,
> 
> There aren't any tools in the libraries to help you "out of the box",
> so you'll probably have to devise your own metadata storage and state
> management scheme for such a system.
> 
> best
> Wes
> 
> On Thu, Mar 21, 2019 at 9:53 AM Nirmala S <nanna.tech.stuff@gmail.com> wrote:
>> 
>> Hi,
>> 
>>        I am trying to build a caching layer using Arrow on top of ORC files. The
application will ask for a column(which can be of any data type - fixed, variable length)
of data from the cache, the cache needs to check it’s metadata to see if the column is already
present. If yes, it can return the data to application. If not the data needs to be fetched
from ORC files, cached and then returned to application. The application is multi-threaded
and is based on C++. Application has a read-only workload.
>> 
>>        This being the case what is the best method to maintain the metadata and the
data in Arrow, is there any good practise ?
>> 
>>        If cache size is smaller than the ORC file size, should I be putting in a
logic to swap the data using some algorithm like LRU or is this already present in Arrow ?
>> 
>> 
>> Thanks in advance
>> Nirmala
>> 
>> 
>> 
>> 


Mime
View raw message