avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@cloudera.com>
Subject Re: Avro consumes all memory on box
Date Tue, 27 Oct 2015 18:08:59 GMT
It sounds like the file you are reading is malformed. Could you share
the file or how it was written?

On Tue, Oct 27, 2015 at 1:01 PM, web user <webuser1200@gmail.com> wrote:
> I ran this in a vm with much less memory and it immediately failed with a
> memory error:
>
> Traceback (most recent call last):
>   File "testavro.py", line 31, in <module>
>     for r in reader:
>   File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line 362,
> in next
>     datum = self.datum_reader.read(self.datum_decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 445, in
> read
>     return self.read_data(self.writers_schema, self.readers_schema, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 490, in
> read_data
>     return self.read_record(writers_schema, readers_schema, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 690, in
> read_record
>     field_val = self.read_data(field.type, readers_field.type, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 484, in
> read_data
>     return self.read_array(writers_schema, readers_schema, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 582, in
> read_array
>     for i in range(block_count):
> MemoryError
>
>
> On Tue, Oct 27, 2015 at 1:36 PM, web user <webuser1200@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm doing the following:
>>
>> from avro.datafile import DataFileReader
>> from avro.datafile import DataFileWriter
>> from avro.io import DatumReader
>> from avro.io import DatumWriter
>>
>> def OpenAvroFileToRead(avro_filename):
>>    DataFileReader(open(avro_filename, 'r'), DatumReader())
>>
>>
>> with OpenAvroFileToRead(avro_filename) as reader:
>>    for r in reader:
>>        ....
>>
>> I have an avro file which is only 500 bytes. I think there is a data
>> structure in there which is null or empty.
>>
>> I put in print statements before and after "for r in reader". On the
>> instruction, for r in reader it consumes about 400Gigs of memory before I
>> have to kill the process.
>>
>> That is 400Gigs! Ihave 1TB on my server. I have tried this with 1.6.1 and
>> 1.7.1 and 1.7.7 and get the same behavior on all three versions.
>>
>> Any ideas on what is causing this?
>>
>> Regards,
>>
>> WU
>
>



-- 
Sean

Mime
View raw message