crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micah Whitacre <mkwhita...@gmail.com>
Subject Re: Reading Hive Tables into PCollection
Date Fri, 29 Jan 2016 23:47:42 GMT
Rough guess would be using the client[1] you can get the Table and from
there get the StorageDescriptor[2].

Something like:
Path path = new Path(client.getTable(namespace,
name).getSd().getLocation());

[1] -
https://hive.apache.org/javadocs/r0.13.1/api/metastore/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.html#getTable(java.lang.String,
java.lang.String)
[2] -
https://hive.apache.org/javadocs/r0.12.0/api/org/apache/hadoop/hive/metastore/api/StorageDescriptor.html

On Fri, Jan 29, 2016 at 12:19 PM, Josh Wills <josh.wills@gmail.com> wrote:

> I am sure there is a way to do it using the HS2 thrift APIs, but I've
> never done it myself.
>
> On Fri, Jan 29, 2016 at 10:16 AM, Robinson, Landon - Landon <
> landon.t.robinson@lowes.com> wrote:
>
>> On this same note, I still have a similar problem to solve.
>> I can point Crunch at an HDFS location and it will ingest/read the Orc
>> file just fine.
>>
>> But is there a way (maybe levering Hcat/Hive apis) to get the file
>> locations dynamically/from Hive? Can I ask Hcat/Hive about a table and its
>> partitions, and it tell me the file location on HDFS (which I can then pass
>> to Crunch to consume the file into the pipeline)?
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: <Robinson>, LCI <landon.t.robinson@lowes.com>
>> Date: Friday, January 29, 2016 at 10:41 AM
>> To: LCI <landon.t.robinson@lowes.com>, Apache Crunch Mailing List <
>> user@crunch.apache.org>, David Ortiz <dpo5003@gmail.com>
>>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> *Solved:*
>>
>> Turns out you can use this:
>>
>> 	private HiveChar acl_idc;
>>
>> That comes from this package: org.apache.hadoop.hive.common.type.HiveChar;
>>
>> Sorry for all the emails, but hope the findings help someone else!
>>
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: <Robinson>, LCI <landon.t.robinson@lowes.com>
>> Date: Friday, January 29, 2016 at 10:36 AM
>> To: Apache Crunch Mailing List <user@crunch.apache.org>, LCI <
>> landon.t.robinson@lowes.com>, David Ortiz <dpo5003@gmail.com>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> Additionally, we tried allowing those characters to be strings, but get
>> the below error. The real issue is getting the Orc ‘char’ to cast to
>> something we can use in the Orc structure.
>>
>> Exception in thread "main" org.apache.crunch.CrunchRuntimeException:
>> Error while reading local file: file:/tmp/crunch-test/000000_0
>> at
>> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
>> at
>> org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
>> at com.google.common.collect.Iterators$5.next(Iterators.java:607)
>> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
>> at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
>> at
>> org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
>> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
>> at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
>> at
>> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at
>> com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
>> *Caused by: java.lang.ClassCastException:
>> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
>> org.apache.hadoop.io.Text*
>> at
>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
>> at
>> org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
>> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
>> at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
>> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
>> at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
>> at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
>> at
>> org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
>> ... 15 more
>>
>> *Verint1978Record*
>>
>> public class Verint1978Record {
>>
>>    private String lct_nbr;
>>    private String vid_caa_id;
>>    private Integer hrs_nbr;
>>    private Integer mte_nbr;
>>       private String acl_idc;
>>    private Integer sec_dur;
>>    private Integer sec_to_pcs;
>>    private Integer sec_pcd;
>>       private String use_for_rpr_idc;
>>    private Integer grp_cnt;
>>    private Integer sng_cnt;
>>    private String upd_dt;
>>    private String upd_id;
>>    private String cal_dt;
>>
>> }
>>
>>
>>
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: <Robinson>, LCI <landon.t.robinson@lowes.com>
>> Reply-To: Apache Crunch Mailing List <user@crunch.apache.org>
>> Date: Friday, January 29, 2016 at 10:33 AM
>> To: David Ortiz <dpo5003@gmail.com>, Apache Crunch Mailing List <
>> user@crunch.apache.org>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> Right, we’ve been trying this with little luck — largely because I get
>> the error:
>>
>> Caused by: java.lang.ClassCastException:
>> org.apache.hadoop.hive.serde2.io.HiveCharWritable cannot be cast to
>> org.apache.hadoop.hive.ql.io.orc.OrcStruct
>>
>> *Code:*
>>
>> OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new
Path(inputPath), Orcs.reflects(Verint1978Record.class));
>> PCollection<Verint1978Record> persons = pipeline.read(source);
>>
>> *Verint1978Record*
>>
>> public class Verint1978Record {
>>
>>    private String lct_nbr;
>>    private String vid_caa_id;
>>    private Integer hrs_nbr;
>>    private Integer mte_nbr;
>>    private Character acl_idc;
>>    private Integer sec_dur;
>>    private Integer sec_to_pcs;
>>    private Integer sec_pcd;
>>    private Character use_for_rpr_idc;
>>    private Integer grp_cnt;
>>    private Integer sng_cnt;
>>    private String upd_dt;
>>    private String upd_id;
>>    private String cal_dt;
>>
>> }
>>
>>
>> ---------------------------------------------------------------------------
>> Landon Robinson
>> Big Data & Hadoop Engineer
>> IT Business Intelligence, Lowe’s Companies Inc.
>>
>> ---------------------------------------------------------------------------
>>
>> From: David Ortiz <dpo5003@gmail.com>
>> Date: Friday, January 29, 2016 at 10:19 AM
>> To: LCI <landon.t.robinson@lowes.com>, Apache Crunch Mailing List <
>> user@crunch.apache.org>
>> Subject: Re: Reading Hive Tables into PCollection
>>
>> http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/
>>
>> Here's the java excerpt from that article to read into Avro class (I'm
>> assuming).
>>
>> [code language=”Java”]
>> // Read an ORCFile using reflection-based serialization (slowest):
>> OrcFileSource<Person> source = new OrcFileSource<Person>(new
>> Path(inputPath), \
>> Orcs.reflection(Person.class));
>> PCollection<Person> persons = pipeline.read(source);
>>
>> On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <
>> landon.t.robinson@lowes.com> wrote:
>>
>>> Orc format.
>>>
>>> ---------------------------------------------------------------------------
>>> Landon Robinson
>>> Big Data & Hadoop Engineer
>>> IT Business Intelligence, Lowe’s Companies Inc.
>>>
>>> ---------------------------------------------------------------------------
>>>
>>> From: David Ortiz <dpo5003@gmail.com>
>>> Reply-To: Apache Crunch Mailing List <user@crunch.apache.org>
>>> Date: Thursday, January 28, 2016 at 1:22 PM
>>> To: Apache Crunch Mailing List <user@crunch.apache.org>
>>> Subject: Re: Reading Hive Tables into PCollection
>>>
>>> What format are they stored as?
>>>
>>> On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <
>>> landon.t.robinson@lowes.com> wrote:
>>>
>>>> Crunch Gurus,
>>>>
>>>> What is the Crunch-convenient or recommended way to read the contents
>>>> of a Hive table into a Pcollection?
>>>> Thanks!
>>>> Best,
>>>> Landon
>>>>
>>>> ---------------------------------------------------------------------------
>>>> Landon Robinson
>>>> Big Data & Hadoop Engineer
>>>>
>>>> ---------------------------------------------------------------------------
>>>> NOTICE: All information in and attached to the e-mails below may be
>>>> proprietary, confidential, privileged and otherwise protected from improper
>>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>>> you are not authorized to intercept, read, print, retain, copy, forward,
or
>>>> disseminate this message. If you have erroneously received this
>>>> communication, please notify the sender immediately by phone
>>>> (704-758-1000) or by e-mail and destroy all copies of this message
>>>> electronic, paper, or otherwise.
>>>>
>>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>>> and Vendors collectively acknowledge and agree the transmittal of
>>>> information via email is voluntary, is offered as a convenience, and is not
>>>> a secured method of communication; Not to transmit any payment information
>>>> E.G. credit card, debit card, checking account, wire transfer information,
>>>> passwords, or sensitive and personal information E.G. Driver's license,
>>>> DOB, social security, or any other information the user wishes to remain
>>>> confidential; To transmit only non-confidential information such as plans,
>>>> pictures and drawings and to assume all risk and liability for and
>>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>>> transmittal of documents or including non-confidential information in the
>>>> body of an email transmittal. Thank you. *
>>>>
>>> NOTICE: All information in and attached to the e-mails below may be
>>> proprietary, confidential, privileged and otherwise protected from improper
>>> or erroneous disclosure. If you are not the sender's intended recipient,
>>> you are not authorized to intercept, read, print, retain, copy, forward, or
>>> disseminate this message. If you have erroneously received this
>>> communication, please notify the sender immediately by phone
>>> (704-758-1000) or by e-mail and destroy all copies of this message
>>> electronic, paper, or otherwise.
>>>
>>> *By transmitting documents via this email: Users, Customers, Suppliers
>>> and Vendors collectively acknowledge and agree the transmittal of
>>> information via email is voluntary, is offered as a convenience, and is not
>>> a secured method of communication; Not to transmit any payment information
>>> E.G. credit card, debit card, checking account, wire transfer information,
>>> passwords, or sensitive and personal information E.G. Driver's license,
>>> DOB, social security, or any other information the user wishes to remain
>>> confidential; To transmit only non-confidential information such as plans,
>>> pictures and drawings and to assume all risk and liability for and
>>> indemnify Lowe's from any claims, losses or damages that may arise from the
>>> transmittal of documents or including non-confidential information in the
>>> body of an email transmittal. Thank you. *
>>>
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>> NOTICE: All information in and attached to the e-mails below may be
>> proprietary, confidential, privileged and otherwise protected from improper
>> or erroneous disclosure. If you are not the sender's intended recipient,
>> you are not authorized to intercept, read, print, retain, copy, forward, or
>> disseminate this message. If you have erroneously received this
>> communication, please notify the sender immediately by phone
>> (704-758-1000) or by e-mail and destroy all copies of this message
>> electronic, paper, or otherwise.
>>
>> *By transmitting documents via this email: Users, Customers, Suppliers
>> and Vendors collectively acknowledge and agree the transmittal of
>> information via email is voluntary, is offered as a convenience, and is not
>> a secured method of communication; Not to transmit any payment information
>> E.G. credit card, debit card, checking account, wire transfer information,
>> passwords, or sensitive and personal information E.G. Driver's license,
>> DOB, social security, or any other information the user wishes to remain
>> confidential; To transmit only non-confidential information such as plans,
>> pictures and drawings and to assume all risk and liability for and
>> indemnify Lowe's from any claims, losses or damages that may arise from the
>> transmittal of documents or including non-confidential information in the
>> body of an email transmittal. Thank you. *
>>
>
>

Mime
View raw message