avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Carey <scottca...@apache.org>
Subject Re: Random access in an avro file
Date Tue, 02 Jul 2013 18:59:39 GMT
There are a couple other index formats that could apply.  You can seek to a
sync marker and scan from there.    For example Avro files can be a target
for Elephant Twin 
(http://www.slideshare.net/squarecog/flexible-insitu-indexing-for-hadoop-via
-elephant-twin ; http://gitrep.com/users/twitter/repos/elephant-twin).

However, that is a light-weight index for marking which blocks have records
that match the index, it does not locate the exact record.

From:  "kulkarni.swarnim@gmail.com" <kulkarni.swarnim@gmail.com>
Reply-To:  "user@avro.apache.org" <user@avro.apache.org>
Date:  Monday, July 1, 2013 10:26 AM
To:  user <user@avro.apache.org>
Subject:  Re: Random access in an avro file

Thanks for the reply Doug.

Out of curiosity, is maintaining sync markers while writing the file and
then passing these markers to the readers while reading not a good way to
achieve random access in avro? Atleast that's what my understanding from
reading the javadoc[1] was, which could be flawed.

[1] 
http://avro.apache.org/docs/1.3.3/api/java/org/apache/avro/file/DataFileWrit
er.html#sync()


On Mon, Jul 1, 2013 at 12:05 PM, Doug Cutting <cutting@apache.org> wrote:
> Avro data files do not generally support random access.
> 
> SortedKeyValueFile supports random access by key.
> 
> http://avro.apache.org/docs/current/api/java/org/apache/avro/hadoop/file/Sorte
> dKeyValueFile.Reader.html
> 
> From the documentation:
> 
> "The SortedKeyValueFile is a directory with two files, named 'data'
> and 'index'. The 'data' file is an ordinary Avro container file with
> records. Each record has exactly two fields, 'key' and 'value'. The
> keys are sorted lexicographically. The 'index' file is a small Avro
> container file mapping keys in the 'data' file to their byte
> positions. The index file is intended to fit in memory, so it should
> remain small. There is one entry in the index file for each data block
> in the Avro container file."
> 
> Doug
> 
> On Mon, Jul 1, 2013 at 8:37 AM, kulkarni.swarnim@gmail.com
> <kulkarni.swarnim@gmail.com> wrote:
>> > Hello,
>> >
>> > Is it possible to have random access to a record in an avro file? For
>> > instance, if I have an avro file with a schema containing four records:
>> > employee id, name, address and phone. While reading the file, is there any
>> > way at all to directly jump to a record with employee id 100 instead of
>> > having to scan the whole file every single time and filtering out records?
>> >
>> > Thanks for the help.
>> >
>> > --
>> > Swarnim



-- 
Swarnim 



Mime
View raw message