arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From anders johansson <anders.johans...@tickup.se>
Subject Re: [C++] error when writing Timestamps in NANOS resolution using StreamWriter to parquet files
Date Wed, 09 Dec 2020 11:36:45 GMT
Hi again,

I ran into a similar problem with decimal, where if I set the type to
LogicalType::Decimal(4, 4); and try to write a uint32_t I get the following
error message: "Column converted type mismatch.  Column 'Price' has
converted type[DECIMAL] not 'INT_32'"

When I look at the StreamWriter code (line 186 in stream_writer.cc), it
looks like the function CheckColumn will throw the error whenever a non
trivial type is written to the output stream.

As I understand it from the code comments, the converted type is legacy
code, so I guess this check is outdated?

BR,
Anders

On Wed, Dec 9, 2020 at 12:27 PM anders johansson <anders.johansson@tickup.se>
wrote:

> It should be
>   auto time_type = LogicalType::Int(64, true);
>
> On Wed, Dec 9, 2020 at 12:27 PM anders johansson <
> anders.johansson@tickup.se> wrote:
>
>> Hi,
>>
>> Thanks for pointing that out
>>
>> On Wed, Dec 9, 2020 at 11:20 AM Uwe L. Korn <uwelk@xhochy.com> wrote:
>>
>>> Hello Anders,
>>>
>>> you have twice the same time_type in your mail. I guess one of them
>>> should be different?
>>>
>>> Cheers
>>> Uwe
>>>
>>> On Wed, Dec 9, 2020, at 11:00 AM, anders johansson wrote:
>>>
>>> Hi,
>>>
>>> I am trying to write time stamps in int64_t format representing time in
>>> UTC normalized nanoseconds to a parquet file.
>>>
>>> I'm using the following code:
>>>
>>> auto time_type = LogicalType::Timestamp(true,
>>> LogicalType::TimeUnit::NANOS, false, false);
>>> NodeVector nv;
>>>
>>> nv.push_back(PrimitiveNode::Make("Time", Repetition::REQUIRED,
>>> time_type, Type::INT64));
>>>
>>> but when I try to write to the output stream
>>>
>>> std::shared_ptr<parquet::StreamWriter> parquet_os_;
>>> *parquet_os_ << se.time; /* time is uint64_t */
>>>
>>> I get the following runtime error:"Column converted type mismatch.
>>> Column 'Time' has converted type[NONE] not 'INT_64'"
>>>
>>> Everything works fine if I set:
>>>
>>> auto time_type = LogicalType::Timestamp(true,
>>> LogicalType::TimeUnit::NANOS, false, false);
>>>
>>> but I want it as Time or Timestamp so that I get it in the proper format
>>> when I read the file using pandas in python.
>>>
>>> Thanks,
>>> Anders
>>>
>>>
>>>

Mime
View raw message