crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robinson, Landon - Landon" <>
Subject Help Reading Orc Files
Date Wed, 03 Feb 2016 21:59:03 GMT
Crunch Gurus,

Need some advice. I have experience writing Orc files in Crunch, and I can successfully read
them in Crunch and print them out.
But when I attempt to process them with a DoFn, I get this error. What should I do?

Exception in thread "Thread-5" java.lang.NoSuchFieldError: HIVE_ORC_SPLIT_STRATEGY

Here’s my code:"Generating Hadoop Configuration...");
        Configuration crunchConf = getConf();"Establishing OrcFile Target for Final Output...");
        OrcFileTarget target = new OrcFileTarget(new Path(outputPath));
        //Establish Pipeline"Generating Crunch Map-Reduce Pipeline...");
        Pipeline pipeline = new MRPipeline(DataQualityDriver.class,crunchConf);

        //Establish OrcFileSource (emulates a Java class) linked to HDFS Path"Generating Orc File Source around given HDFS path...");

        OrcFileSource<Verint1978Record> orcsource = new OrcFileSource<Verint1978Record>(new
Path(inputPath), Orcs.reflects(Verint1978Record.class));

//        Ingest the Orc File into a PCollection"Generating PCollection of Verint1978Record from Data...");
        PCollection<Verint1978Record> data =;

        for (Verint1978Record record : data.materialize()){

//this all works fine until THIS point

        // can’t run these files through a DOFN or write them out without getting above

        //this dofn simply reads the prev PCollection and prints it back out as a string (just
to test the DOFN)

        PCollection<String> newData = data.parallelDo(DataQualityDoFns.DoFn_ProduceSameRecords(),
                for (String record : newData.materialize()){

PipelineResult result = pipeline.done();

DoFN (super lazy):

static DoFn<Verint1978Record, String> DoFn_ProduceSameRecords(){
    return new DoFn<Verint1978Record, String>() {
        public void process(Verint1978Record input, Emitter<String> emitter) {

            emitter.emit(input.getLct_nbr() + "" + input.getVid_caa_id()+ "" + input.getHrs_nbr()+
"" + input.getMte_nbr()+ "" + input.getAcl_idc()+ "" + input.getSec_dur()+ "" + input.getSec_to_pcs()+
"" + input.getSec_pcd()+ "" + input.getUse_for_rpr_idc()+ "" + input.getGrp_cnt()+ "" + input.getSng_cnt()+
"" + input.getUpd_dt()+ "" + input.getUpd_id()+ "" + input.getCal_dt());


Landon Robinson
Big Data & Hadoop Engineer
IT Business Intelligence, Lowe’s Companies Inc.

NOTICE: All information in and attached to the e-mails below may be proprietary, confidential,
privileged and otherwise protected from improper or erroneous disclosure. If you are not the
sender's intended recipient, you are not authorized to intercept, read, print, retain, copy,
forward, or disseminate this message. If you have erroneously received this communication,
please notify the sender immediately by phone (704-758-1000) or by e-mail and destroy all
copies of this message electronic, paper, or otherwise. 

By transmitting documents via this email: Users, Customers, Suppliers and Vendors collectively
acknowledge and agree the transmittal of information via email is voluntary, is offered as
a convenience, and is not a secured method of communication; Not to transmit any payment information
E.G. credit card, debit card, checking account, wire transfer information, passwords, or sensitive
and personal information E.G. Driver's license, DOB, social security, or any other information
the user wishes to remain confidential; To transmit only non-confidential information such
as plans, pictures and drawings and to assume all risk and liability for and indemnify Lowe's
from any claims, losses or damages that may arise from the transmittal of documents or including
non-confidential information in the body of an email transmittal. Thank you.

View raw message