arrow-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Percossi <mar...@percossi.com>
Subject xbow - range-v3 views/actions for Arrow C++
Date Mon, 19 Apr 2021 14:54:51 GMT
Hi, I am developing a library, called xbow [*], to provide improved
ergonomics for Arrow C++ while ideally losing no performance. (If you're on
reddit, an upvote would be appreciated too! [**]) With xbow, you can write
code like this:

def_record(suspect,
    (int32_t, id),
    (string, name),
    (double, salary)
);

auto suspects = vector<suspect>{
    {1, "Keyser Söze"s, 1000.0}, {2, "Kobayashi"s, 500.0},   {3, "Fred
Fenster"s, 500.0},
    {4, "Jack Baer"s, 100.0},    {5, "Dean Keaton"s, 800.0}, {6,
"Michael McManus"s, 100.0},
};
print("input rows: {}\n", suspects);
// below: traverse the rows, changing name to upper case, skipping
every other element,// cycling over rows so that they repeat and
taking exactly 20 of these rows, and finally// this range-v3 range is
converted to a regular arrow table.// This code shows that we can take
a bog-standard range-v3 pipeline and convert it to// an arrow object.
This could later, for example, be written to a parquet file
(WIP).const auto table = suspects
                     | views::transform([](auto&& p) -> suspect& {
                        boost::to_upper(p.name);
                        return p;
                       })
                     | views::stride(2)
                     | views::cycle
                     | views::take(20)
                     | xb::arrow::actions::to_table;
// below: note that to_range<suspect>(table) returns a range
consisting of chunks, each of which// is also a range. These chunks
correspond exactly to the actual low-level chunks in the// arrow file.
We view::join this range to produce a single, collated range, which we
then// convert to a std::vector<suspect> for the sole reason of
printing. Note how easily we// taped together the chunks! Normally
this would be two-level for loop involving laborious// extraction of
each field, type-casting, urgh!print("round-tripped rows: {}\n",
    xb::arrow::views::to_range<suspect>(table) | views::join |
to<vector<suspect>>);


It would be great to get feedback from other users of Arrow. I have a few
things planned, but want to see if there's interest before I invest more of
my time:

- zero cost optional-like objects directly using the bitmask memory, to
avoid allocation of temporaries in traversal functions.
- date support (done but not pushed)
- time support (WIP)
- indexes and more dataframe functionality
- integration with python via PEP484.

Thanks in advance!

[*] https://github.com/seertaak/xbow
[**]
https://www.reddit.com/r/cpp/comments/mswno0/xbow_rangev3_actions_and_ranges_for_arrow_c/

--

Martin Percossi

Mime
View raw message