drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julien Le Dem <jul...@dremio.com>
Subject Re: How to get started with a new format conversion and representation
Date Fri, 28 Aug 2015 17:44:52 GMT
Hi Edmon,
I would start with picking one of Avro, Thrift or Protobuf to describe a
schema for this data:
http://avro.apache.org/docs/current/#schemas
https://developers.google.com/protocol-buffers/
http://thrift.apache.org/docs/idl

>From there you can write to Parquet using the appropriate integration:
https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestSpecificReadWrite.java
https://github.com/apache/parquet-mr/blob/master/parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoInputOutputFormatTest.java
https://github.com/apache/parquet-mr/blob/master/parquet-thrift/src/test/java/org/apache/parquet/hadoop/thrift/TestInputOutputFormat.java

Julien

On Thu, Aug 27, 2015 at 7:23 PM, Edmon Begoli <ebegoli@gmail.com> wrote:

> This might be more of a question for Parquet folks here than Drill-ers, but
> nevertheless:
>
> I would like to be able to convert EDI HL7 v.2 messages into Parquet
> representation, and make them amenable to Drill querying.
> (Here is a sample claim message 837p in HL7 representation (page 8):
> http://www.vitahealth.org/Modules/ShowDocument2.aspx?documentid=545 )
>
> This is a lengthy topic which I could discuss in details, but for now I
> would like to just know where and how to get started.
>
> Thank you,
> Edmon
>



-- 
Julien

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message