avro-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Groth <sgr...@yahoo-inc.com>
Subject Re: Avro consumes all memory on box
Date Tue, 27 Oct 2015 18:56:36 GMT
Are you using version 2 or 3 of python avro? For a redacted schema, just give the schema with
all field names and namespaces changed. If the schema is really long and complicated, you
could just give the part that you suspect is causing issues.

Sam

 


     On Tuesday, October 27, 2015 1:42 PM, web user <webuser1200@gmail.com> wrote:
   

 No. I don't think the problem is that. The same code has worked with reading many many files.
This particular file hit a corner case where one of the data structures has no records in
it and it is causing a lot of grief to the python avro routine. It's been generated from C++
avro routines...
Regards,
WU
On Tue, Oct 27, 2015 at 2:38 PM, Sam Groth <sgroth@yahoo-inc.com> wrote:

I think you may be missing a "return" when you create your DataFileReader. I have always been
able to read data in python using the standard methods; so I don't think there is a problem
with the implementation. That said, the python implementation is significantly slower than
Java or C.

Sam 


     On Tuesday, October 27, 2015 1:23 PM, web user <webuser1200@gmail.com> wrote:
   

 Unfortunately the company I work at has a strict policy about sharing data. Having said that
I don't think the file is corrupted. 

I ran the following command:

java -jar avro-tools-1.7.7.jar tojson testdata.avro

and it generates a file of 1 byte

I also ran java -jar avro-tools-1.7.7.jar getschema testdata.avro and it gets back the correct
schema. 

Is there any way when using the python library for it not to have consume all memory on the
entire box?

Regards,

WU



On Tue, Oct 27, 2015 at 2:08 PM, Sean Busbey <busbey@cloudera.com> wrote:

It sounds like the file you are reading is malformed. Could you share
the file or how it was written?

On Tue, Oct 27, 2015 at 1:01 PM, web user <webuser1200@gmail.com> wrote:
> I ran this in a vm with much less memory and it immediately failed with a
> memory error:
>
> Traceback (most recent call last):
>   File "testavro.py", line 31, in <module>
>     for r in reader:
>   File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line 362,
> in next
>     datum = self.datum_reader.read(self.datum_decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 445, in
> read
>     return self.read_data(self.writers_schema, self.readers_schema, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 490, in
> read_data
>     return self.read_record(writers_schema, readers_schema, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 690, in
> read_record
>     field_val = self.read_data(field.type, readers_field.type, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 484, in
> read_data
>     return self.read_array(writers_schema, readers_schema, decoder)
>   File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 582, in
> read_array
>     for i in range(block_count):
> MemoryError
>
>
> On Tue, Oct 27, 2015 at 1:36 PM, web user <webuser1200@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm doing the following:
>>
>> from avro.datafile import DataFileReader
>> from avro.datafile import DataFileWriter
>> from avro.io import DatumReader
>> from avro.io import DatumWriter
>>
>> def OpenAvroFileToRead(avro_filename):
>>    DataFileReader(open(avro_filename, 'r'), DatumReader())
>>
>>
>> with OpenAvroFileToRead(avro_filename) as reader:
>>    for r in reader:
>>        ....
>>
>> I have an avro file which is only 500 bytes. I think there is a data
>> structure in there which is null or empty.
>>
>> I put in print statements before and after "for r in reader". On the
>> instruction, for r in reader it consumes about 400Gigs of memory before I
>> have to kill the process.
>>
>> That is 400Gigs! Ihave 1TB on my server. I have tried this with 1.6.1 and
>> 1.7.1 and 1.7.7 and get the same behavior on all three versions.
>>
>> Any ideas on what is causing this?
>>
>> Regards,
>>
>> WU
>
>



--
Sean




   



  
Mime
View raw message