arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weston Pace <weston.p...@gmail.com>
Subject Re: [Cython] Getting and Comparing String Scalars?
Date Wed, 14 Apr 2021 19:57:03 GMT
If you don't need the performance, you could stay in python (use
to_pylist() for the array or as_py() for scalars).

If you do need the performance then you're probably better served getting
the buffers and operating on them directly.  Or, even better, making use of
the compute kernels:

arr = pa.array(['abc', 'ab', 'Xander', None], pa.string())
desired = pa.array(['Xander'], pa.string())
pc.any(pc.is_in(arr, value_set=desired)).as_py() # True

On Wed, Apr 14, 2021 at 6:29 AM Xander Dunn <xander@xander.ai> wrote:

> This works for getting a c string out of the CScalar:
> ```
>                 name_buffer =
> (<CBaseBinaryScalar*>GetResultValue(names.get().\
>                         GetScalar(batch_row_index)).get()).value
>                 name = <char *>name_buffer.get().data()
> ```
>
>
> On Tue, Apr 13, 2021 at 10:43 PM, Xander Dunn <xander@xander.ai> wrote:
>
>> Here is an example code snippet from a .pyx file that successfully
>> iterates through a CRecordBatch and ensures that the timestamps are
>> ascending:
>> ```
>>             while batch_row_index < batch.get().num_rows():
>>                 timestamp =
>> GetResultValue(times.get().GetScalar(batch_row_index))
>>                 new_timestamp = <CTimestampScalar*>timestamp.get()
>>                 current_timestamp = timestamps[name]
>>                 if current_timestamp > new_timestamp.value:
>>                     abort()
>>                 batch_row_index += 1
>> ```
>>
>> However, I'm having difficulty operating on the values in a column of
>> string type. Unlike CTimestampScalar, there is no CStringScalar. Although
>> there is a StringScalar type in C++, it isn't defined in the Cython
>> interface. There is a `CStringType` and a `c_string` type.
>> ```
>>     while batch_row_index < batch.get().num_rows():
>>         name = GetResultValue(names.get().GetScalar(batch_row_index))
>>         name_string = <CStringType*>name.get() # This is wrong
>>         printf("%s\n", name_string) # This prints garbage
>>         if name_string == b"Xander": # Doesn't work
>>             print("found it")
>>         batch_row_index += 1
>> ```
>> How do I get the string value as a C type and compare it to other
>> strings?
>>
>> Thanks,
>> Xander
>>
>
>

Mime
View raw message