arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wes McKinney <wesmck...@gmail.com>
Subject Re: [python] not an arrow file
Date Mon, 25 Jan 2021 18:52:26 GMT
hi Ken -- it looks like you aren't calling "writer->Close()" after
writing the last record batch. I think that will fix the issue

On Mon, Jan 25, 2021 at 12:48 PM Teh, Kenneth M. <teh@anl.gov> wrote:
>
> Hi Wes,
>
> My C++ code is attached. I tried to also read it from C++ by opening the disk file as
a MemoryMappedFile and get the same error when I make a RecordBatchReader on the mmap'ed file,
ie, "not an Arrow file".
>
> There must be some magical sequence of writes needed to make the file kosher.
>
> Thanks for helping.
>
> Ken
>
> p.s. I read your blog about relocating to Nashville.  Was my stomping grounds back in
the 80s. Memories.
>
> ________________________________
> From: Wes McKinney <wesmckinn@gmail.com>
> Sent: Sunday, January 24, 2021 11:41 AM
> To: user@arrow.apache.org <user@arrow.apache.org>
> Subject: Re: [python] not an arrow file
>
> Can you show your C++ code?
>
> On Sun, Jan 24, 2021 at 8:10 AM Teh, Kenneth M. <teh@anl.gov> wrote:
>
> Just started with arrow...
>
> I wrote a record batch to a file using ipc::MakeFileWriter to create a writer and writer->WriteRecordBatch
in a C++ program and tried to read it in python with:
>
> [] import pyarrow as pa
> [] reader = pa.ipc.open_file("myfile")
>
>
> It raises the ArrowInvalid with the message "not an arrow file".
>
> If I write it out as a Table in feather format, I can read it in python. But I want to
write large files on the order of 100GB or more and then read them back into python as pandas
dataframes or something similar.
>
> So, I switched to using an ipc writer.
>
> Can something point me in the right direction?  Thanks.
>
> Ken

Mime
View raw message