incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Travis Crawford <traviscrawf...@gmail.com>
Subject Re: [VOTE] Release HCatalog 0.4.0-incubating (candidate 1)
Date Fri, 30 Mar 2012 04:07:27 GMT
On Thu, Mar 29, 2012 at 1:44 PM, Sushanth Sowmyan <khorgath@gmail.com>wrote:

> Hi Julien,
>
> In the first drop, our intent was to support text and json natively,
> and support everything hive supported. The next thing we intended to
> work on was on the LoadFuncBased* - what kind of LoadFuncs/StoreFuncs
> do you guys use?
>
> I'll be glad to work with you in adding in a LoadFuncBasedSerDe (for
> simple Loaders which are only row translation changes) and/or
> LoadFuncBasedStorageHandler for cases where we need to handle IF/OF
> instantiation semantics/etc.
>


We're primarily using elephant-bird to store records as LZO compressed
thrift objects:

https://github.com/kevinweil/elephant-bird/blob/master/src/java/com/twitter/elephantbird/pig/load/ThriftPigLoader.java

In our environment storing thrift objects has worked well to communicate
schemas between the various apps producing/consuming data, both batch jobs
or real-time streams.

With the LoadFuncBased-stuff we plugged right in and all our existing data
Just Worked.

What do y'all think is the right path forward? I opened
https://issues.apache.org/jira/browse/HCATALOG-345 if we want to move the
discussion to avoid threadjacking the release thread.

--travis



> -Sushanth
>
> On Wed, Mar 28, 2012 at 11:20 AM, Julien Le Dem <julien@twitter.com>
> wrote:
> > Hi Sushant,
> > We are using LoadFuncBasedInputDriver so that we can reuse our existing
> PigLoaders.
> > So we can not easily replace that.
> > What's the plan regarding this functionality ?
> > Julien
> >
> > On Mar 23, 2012, at 10:14 AM, Sushanth Sowmyan wrote:
> >
> >> Hi Travis,
> >>
> >> LoadFuncBasedInputDriver no longer exists.
> >>
> >> Update the hcatalog jar that pig loads, and you'll see that the
> >> HCatLoader no longer tries to use the StorageDriver information. For
> >> most part, if you were using LoadFuncBasedInputDriver to load text,
> >> you should be good to go as-is. For json data, there's a bit of work -
> >> you need to modify the table to specify which SerDe to use
> >> (org.apache.hcatalog.data.JsonSerDe)
> >>
> >> -Sushanth
> >>
> >> On Thu, Mar 22, 2012 at 7:01 PM, Travis Crawford
> >> <traviscrawford@gmail.com> wrote:
> >>> This release candidate builds, but I can't test against our metastore
> due to
> >>> the serde change. What's the recommended migration path? It could be
> "you
> >>> need to reimport everything" but right now I'm not sure how to proceed.
> >>>
> >>>
> >>> 2012-03-23 01:55:58,138 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> >>> ERROR 6017: org.apache.pig.backend.executionengine.ExecException: ERROR
> >>> 2118: java.lang.NoSuchMethodException:
> >>> org.apache.hcatalog.pig.drivers.LoadFuncBasedInputFormat.<init>()
> >>> at
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:282)
> >>> at
> org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:944)
> >>> at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:961)
> >>> at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
> >>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:880)
> >>> at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:833)
> >>> at java.security.AccessController.doPrivileged(Native Method)
> >>> at javax.security.auth.Subject.doAs(Subject.java:396)
> >>> at
> >>>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
> >>> at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:833)
> >>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:807)
> >>> at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
> >>> at
> >>>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
> >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>> at
> >>>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>> at
> >>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>> at java.lang.reflect.Method.invoke(Method.java:597)
> >>> at
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigJobControl.mainLoopAction(PigJobControl.java:144)
> >>> at
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigJobControl.run(PigJobControl.java:121)
> >>> at java.lang.Thread.run(Thread.java:662)
> >>> at
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:231)
> >>> Caused by: java.lang.RuntimeException: java.lang.NoSuchMethodException:
> >>> org.apache.hcatalog.pig.drivers.LoadFuncBasedInputFormat.<init>()
> >>> at
> >>>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
> >>> at
> >>>
> org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getMapRedInputFormat(HCatBaseInputFormat.java:102)
> >>> at
> >>>
> org.apache.hcatalog.mapreduce.HCatBaseInputFormat.getSplits(HCatBaseInputFormat.java:159)
> >>> at
> >>>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:270)
> >>> ... 20 more
> >>> Caused by: java.lang.NoSuchMethodException:
> >>> org.apache.hcatalog.pig.drivers.LoadFuncBasedInputFormat.<init>()
> >>> at java.lang.Class.getConstructor0(Class.java:2706)
> >>> at java.lang.Class.getDeclaredConstructor(Class.java:1985)
> >>> at
> >>>
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:109)
> >>> ... 23 more
> >>>
> >>> --travis
> >>>
> >>>
> >>>
> >>> On Thu, Mar 22, 2012 at 5:59 PM, Alan Gates <gates@hortonworks.com>
> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> I have created a candidate build for HCatalog 0.4.0-incubating.  Keys
> used
> >>>> to sign the release are available from the KEYS file in HCatalog's
> SVN.
> >>>>  Please download, test, and try it out:
> >>>>
> >>>>
> http://people.apache.org/~gates/hcatalog-0.4.0-incubating-candidate-1/
> >>>>
> >>>> The release, md5 signature, gpg signature, release notes, and rat
> report
> >>>> can all be found at the above address.
> >>>>
> >>>> Should we release this? Vote closes on Tuesday, March 27th.
> >>>>
> >>>> Alan.
> >>>
> >>>
> >
>

Mime
View raw message