crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <josh.wi...@gmail.com>
Subject Re: Reading Hive Tables into PCollection
Date Fri, 29 Jan 2016 18:19:00 GMT
I am sure there is a way to do it using the HS2 thrift APIs, but I've never
done it myself.

On Fri, Jan 29, 2016 at 10:16 AM, Robinson, Landon - Landon <
landon.t.robinson@lowes.com> wrote:

> On this same note, I still have a similar problem to solve.
> I can point Crunch at an HDFS location and it will ingest/read the Orc
> file just fine.
>
> But is there a way (maybe levering Hcat/Hive apis) to get the file
> locations dynamically/from Hive? Can I ask Hcat/Hive about a table and its
> partitions, and it tell me the file location on HDFS (which I can then pass
> to Crunch to consume the file into the pipeline)?
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <landon.t.robinson@lowes.com>
> Date: Friday, January 29, 2016 at 10:41 AM
> To: LCI <landon.t.robinson@lowes.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>, David Ortiz <dpo5003@gmail.com>
>
> Subject: Re: Reading Hive Tables into PCollection
>
> *Solved:*
>
> Turns out you can use this:
>
> 	private HiveChar acl_idc;
>
> That comes from this package: org.apache.hadoop.hive.common.type.HiveChar;
>
> Sorry for all the emails, but hope the findings help someone else!
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <landon.t.robinson@lowes.com>
> Date: Friday, January 29, 2016 at 10:36 AM
> To: Apache Crunch Mailing List <user@crunch.apache.org>, LCI <
> landon.t.robinson@lowes.com>, David Ortiz <dpo5003@gmail.com>
> Subject: Re: Reading Hive Tables into PCollection
>
> Additionally, we tried allowing those characters to be strings, but get
> the below error. The real issue is getting the Orc ‘char’ to cast to
> something we can use in the Orc structure.
>
> Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error
> while reading local file: file:/tmp/crunch-test/000000_0
> at
> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
> at
> org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
> at com.google.common.collect.Iterators$5.next(Iterators.java:607)
> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
> at
> org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
> at
> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at
> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
> *Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
> org.apache.hadoop.io.Text*
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
> at
> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
> at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
> at
> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
> ... 15 more
>
> *Verint1978Record*
>
> public class Verint1978Record {
>
>    private String lct_nbr;
>    private String vid_caa_id;
>    private Integer hrs_nbr;
>    private Integer mte_nbr;
>       private String acl_idc;
>    private Integer sec_dur;
>    private Integer sec_to_pcs;
>    private Integer sec_pcd;
>       private String use_for_rpr_idc;
>    private Integer grp_cnt;
>    private Integer sng_cnt;
>    private String upd_dt;
>    private String upd_id;
>    private String cal_dt;
>
> }
>
>
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: <Robinson>, LCI <landon.t.robinson@lowes.com>
> Reply-To: Apache Crunch Mailing List <user@crunch.apache.org>
> Date: Friday, January 29, 2016 at 10:33 AM
> To: David Ortiz <dpo5003@gmail.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>
> Subject: Re: Reading Hive Tables into PCollection
>
> Right, we’ve been trying this with little luck — largely because I get the
> error:
>
> Caused by: java.lang.ClassCastException:
> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
> org.apache.hadoop.hive.ql.io.orc.OrcStruct
>
> *Code:*
>
> OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new
Path(inputPath), Orcs.reflects(Verint1978Record.class));
> PCollection<Verint1978Record> persons = pipeline.read(source);
>
> *Verint1978Record*
>
> public class Verint1978Record {
>
>    private String lct_nbr;
>    private String vid_caa_id;
>    private Integer hrs_nbr;
>    private Integer mte_nbr;
>    private Character acl_idc;
>    private Integer sec_dur;
>    private Integer sec_to_pcs;
>    private Integer sec_pcd;
>    private Character use_for_rpr_idc;
>    private Integer grp_cnt;
>    private Integer sng_cnt;
>    private String upd_dt;
>    private String upd_id;
>    private String cal_dt;
>
> }
>
> ---------------------------------------------------------------------------
> Landon Robinson
> Big Data & Hadoop Engineer
> IT Business Intelligence, Lowe’s Companies Inc.
> ---------------------------------------------------------------------------
>
> From: David Ortiz <dpo5003@gmail.com>
> Date: Friday, January 29, 2016 at 10:19 AM
> To: LCI <landon.t.robinson@lowes.com>, Apache Crunch Mailing List <
> user@crunch.apache.org>
> Subject: Re: Reading Hive Tables into PCollection
>
> http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/
>
> Here's the java excerpt from that article to read into Avro class (I'm
> assuming).
>
> [code language=”Java”]
> // Read an ORCFile using reflection-based serialization (slowest):
> OrcFileSource<Person> source = new OrcFileSource<Person>(new
> Path(inputPath), \
> Orcs.reflection(Person.class));
> PCollection<Person> persons = pipeline.read(source);
>
> On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> Orc format.
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: David Ortiz <dpo5003@gmail.com>
>> Reply-To: Apache Crunch Mailing List <user@crunch.apache.org>
>> Date: Thursday, January 28, 2016 at 1:22 PM
>> To: Apache Crunch Mailing List <user@crunch.apache.org>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> What format are they stored as?
>>
>> On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <
>> landon.t.robinson@lowes.com> wrote:
>>
>>> Crunch Gurus,
>>>
>>> What is the Crunch-convenient or recommended way to read the contents of
>>> a Hive table into a Pcollection?
>>> Thanks!
>>> Best,
>>> Landon
>>>
>>> ---------------------------------------------------------------------------
>>> Landon Robinson
>>> Big Data & Hadoop Engineer
>>>
>>> ---------------------------------------------------------------------------
>>> NOTICE: All information in and attached to the e-mails below may be
>>> proprietary, confidential, privileged and otherwise protected from improper
>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>> you are not authorized to intercept, read, print, retain, copy, forward, or
>>> disseminate this message. If you have erroneously received this
>>> communication, please notify the sender immediately by phone
>>> (704-758-1000) or by e-mail and destroy all copies of this message
>>> electronic, paper, or otherwise.
>>>
>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>> and Vendors collectively acknowledge and agree the transmittal of
>>> information via email is voluntary, is offered as a convenience, and is not
>>> a secured method of communication; Not to transmit any payment information
>>> E.G. credit card, debit card, checking account, wire transfer information,
>>> passwords, or sensitive and personal information E.G. Driver's license,
>>> DOB, social security, or any other information the user wishes to remain
>>> confidential; To transmit only non-confidential information such as plans,
>>> pictures and drawings and to assume all risk and liability for and
>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>> transmittal of documents or including non-confidential information in the
>>> body of an email transmittal. Thank you. *
>>>
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
> NOTICE: All information in and attached to the e-mails below may be
> proprietary, confidential, privileged and otherwise protected from improper
> or erroneous disclosure. If you are not the sender's intended recipient,
> you are not authorized to intercept, read, print, retain, copy, forward, or
> disseminate this message. If you have erroneously received this
> communication, please notify the sender immediately by phone (704-758-1000)
> or by e-mail and destroy all copies of this message electronic, paper, or
> otherwise.
>
> *By transmitting documents via this email: Users, Customers, Suppliers and
> Vendors collectively acknowledge and agree the transmittal of information
> via email is voluntary, is offered as a convenience, and is not a secured
> method of communication; Not to transmit any payment information E.G.
> credit card, debit card, checking account, wire transfer information,
> passwords, or sensitive and personal information E.G. Driver's license,
> DOB, social security, or any other information the user wishes to remain
> confidential; To transmit only non-confidential information such as plans,
> pictures and drawings and to assume all risk and liability for and
> indemnify Lowe's from any claims, losses or damages that may arise from the
> transmittal of documents or including non-confidential information in the
> body of an email transmittal. Thank you. *
>

Mime
View raw message