orc-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marty J. Sullivan" <marty.sulli...@cornell.edu>
Subject Re: Csv-import: filter columns and piping input?
Date Thu, 21 Feb 2019 16:35:29 GMT
Hi Gang,

Thanks for the info. I think it’s sufficient use other unix tools to prepare the CSV to
only have the columns I’m interested in (cut + tr for example). I think that piping support
would be very useful addition to the program though because I currently must save large amounts
of CSV to disk first rather than just staying in memory.

I will have to brush up on my C++ but I could probably attempt a contribution in April or
May, but no guarantees :)


From: "ustcwg@gmail.com" <ustcwg@gmail.com>
Reply-To: "user@orc.apache.org" <user@orc.apache.org>
Date: Thursday, February 21, 2019 at 4:53 AM
To: "user@orc.apache.org" <user@orc.apache.org>
Subject: Re: Csv-import: filter columns and piping input?

Hi Marty,

No, this tool does not support the features you have mentioned. We can do that if you really
want those. Also you are welcome to contribute it by yourself. :)

Sent from my iPhone

On Feb 20, 2019, at 13:35, Marty J. Sullivan <marty.sullivan@cornell.edu<mailto:marty.sullivan@cornell.edu>>

I have gotten the csv-import c++ tool working fine. However, it doesn’t seem like there
is a way to either:

  1.  Pipe CSV input to the program
  2.  Select only specific columns from the CSV file to output to ORC

I notice that I *can* specify a schema that contains, say, only the first three columns of
a CSV file, but if I want to only include, for example, columns 1,2 and 5, that is not possible.
It would also be great to be able to pipe CSV input via stdin but it doesn’t seem like this
is supported (unless I’m doing something wrong)

Does anyone have any advice or are there plans to support these features in the future?

Marty Sullivan
View raw message