avro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thiruvalluvan MG <thiru...@yahoo.com>
Subject Re: Seeks with DataFileReader in C++
Date Thu, 24 Jan 2013 16:46:01 GMT

I think it is a good use case. One way to achieve what you want is to:

1. Expose the existing members objectCount_ and byteCount_ of DataFileReaderBase as size_t
objectsRemainingInBlock() and size_t bytesRemainingInBlock() in DataFileReader class.
2. Add a new method in DataFileReader class void skip(size_t n), which skips n objects.
3. If you prefer you can add skipBlock() which is a shorthand for skip(objectsRemainingInBlock()).

Does it work for you?



 From: Daniel Russel <drussel@gmail.com>
To: dev@avro.apache.org; Thiruvalluvan MG <thiru_mg@yahoo.com> 
Sent: Wednesday, 23 January 2013 10:33 PM
Subject: Re: Seeks with DataFileReader in C++
In our case, we have files created from large numbers of frames stored sequentially as records
in a data file. Currently, finding the i-th frame requires going to the beginning and reading
all records until the appropriate one is found. Doing binary search or some sort of index
based search would decrease load times for many operations significantly. It would also make
implementing map-reduce sorts of operations on the data files easier since currently there
is no reliably way to shard the files.

I'll work on the patch, nothing written yet :-)

On Jan 23, 2013, at 4:56 AM, Thiruvalluvan MG <thiru_mg@yahoo.com> wrote:

> Hi Daniel,
> I think it will be nice if you can describe your use case. Yes, we'll be interested in
seeing your implementation. Since this will be an added feature, it harms none unless they
use this feature. Please go ahead and create a ticket and submit a patch.
> Thanks
> Thiru
> ________________________________
> From: Daniel Russel <drussel@gmail.com>
> To: dev@avro.apache.org 
> Sent: Wednesday, 23 January 2013 11:20 AM
> Subject: Seeks with DataFileReader in C++
> From what I can tell, there is no way to do any sort of random access with the C++ DataFileReader
API. Is this correct? Is someone working on that? If not, and people think this would be a
generally interesting capability, I'd consider implementing it as I'd kind of like to have
it. Thanks.
>              --Daniel
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message