crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robinson, Landon - Landon" <landon.t.robin...@lowes.com>
Subject Re: Reading Hive Tables into PCollection
Date Fri, 29 Jan 2016 15:36:39 GMT
Additionally, we tried allowing those characters to be strings, but get the below error. The
real issue is getting the Orc ‘char’ to cast to something we can use in the Orc structure.

Exception in thread "main" org.apache.crunch.CrunchRuntimeException: Error while reading local
file: file:/tmp/crunch-test/000000_0
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:110)
at org.apache.crunch.io.CompositePathIterable$2.next(CompositePathIterable.java:99)
at com.google.common.collect.Iterators$5.next(Iterators.java:607)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:266)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:223)
at org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:79)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:165)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:156)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.run(DataQualityDriverTest.java:57)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.lowes.bigdata.closerate.verint.DataQualityDriverTest.main(DataQualityDriverTest.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable
cannot be cast to org.apache.hadoop.io.Text
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:46)
at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableStringObjectInspector.getPrimitiveJavaObject(WritableStringObjectInspector.java:26)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:169)
at org.apache.crunch.types.orc.OrcUtils.convert(OrcUtils.java:222)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:190)
at org.apache.crunch.types.orc.Orcs$ReflectInFn.map(Orcs.java:168)
at org.apache.crunch.fn.CompositeMapFn.map(CompositeMapFn.java:63)
at org.apache.crunch.io.orc.OrcFileReaderFactory$1.next(OrcFileReaderFactory.java:108)
... 15 more


Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
      private String acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
      private String use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}


---------------------------------------------------------------------------
[cid:E38FEBBD-1C12-48B7-B1A5-465C75E68DB8]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: <Robinson>, LCI <landon.t.robinson@lowes.com<mailto:landon.t.robinson@lowes.com>>
Reply-To: Apache Crunch Mailing List <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Date: Friday, January 29, 2016 at 10:33 AM
To: David Ortiz <dpo5003@gmail.com<mailto:dpo5003@gmail.com>>, Apache Crunch Mailing
List <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

Right, we’ve been trying this with little luck — largely because I get the error:

Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.serde2.io.HiveCharWritable
cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcStruct

Code:

OrcFileSource<Verint1978Record> source = new OrcFileSource<Verint1978Record>(new
Path(inputPath), Orcs.reflects(Verint1978Record.class));
PCollection<Verint1978Record> persons = pipeline.read(source);

Verint1978Record

public class Verint1978Record {

   private String lct_nbr;
   private String vid_caa_id;
   private Integer hrs_nbr;
   private Integer mte_nbr;
   private Character acl_idc;
   private Integer sec_dur;
   private Integer sec_to_pcs;
   private Integer sec_pcd;
   private Character use_for_rpr_idc;
   private Integer grp_cnt;
   private Integer sng_cnt;
   private String upd_dt;
   private String upd_id;
   private String cal_dt;

}

---------------------------------------------------------------------------
[cid:81A61E19-6323-41F7-A88E-590E34601268]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dpo5003@gmail.com<mailto:dpo5003@gmail.com>>
Date: Friday, January 29, 2016 at 10:19 AM
To: LCI <landon.t.robinson@lowes.com<mailto:landon.t.robinson@lowes.com>>, Apache
Crunch Mailing List <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

http://hortonworks.com/blog/using-orcfile-cascading-apache-crunch/

Here's the java excerpt from that article to read into Avro class (I'm assuming).

[code language=”Java”]
// Read an ORCFile using reflection-based serialization (slowest):
OrcFileSource<Person> source = new OrcFileSource<Person>(new Path(inputPath),
\
Orcs.reflection(Person.class));
PCollection<Person> persons = pipeline.read(source);

On Fri, Jan 29, 2016 at 10:17 AM Robinson, Landon - Landon <landon.t.robinson@lowes.com<mailto:landon.t.robinson@lowes.com>>
wrote:
Orc format.
---------------------------------------------------------------------------
[cid:4743D013-31C4-407E-A06B-31ABF1E6414D]
Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.
---------------------------------------------------------------------------

From: David Ortiz <dpo5003@gmail.com<mailto:dpo5003@gmail.com>>
Reply-To: Apache Crunch Mailing List <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Date: Thursday, January 28, 2016 at 1:22 PM
To: Apache Crunch Mailing List <user@crunch.apache.org<mailto:user@crunch.apache.org>>
Subject: Re: Reading Hive Tables into PCollection

What format are they stored as?

On Thu, Jan 28, 2016 at 1:20 PM Robinson, Landon - Landon <landon.t.robinson@lowes.com<mailto:landon.t.robinson@lowes.com>>
wrote:
Crunch Gurus,

What is the Crunch-convenient or recommended way to read the contents of a Hive table into
a Pcollection?
Thanks!
Best,
Landon
---------------------------------------------------------------------------
Landon Robinson
Big Data & Hadoop Engineer
---------------------------------------------------------------------------
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.
NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise.

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.

Mime
View raw message