incubator-hcatalog-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Jurney <russell.jur...@gmail.com>
Subject Re: Which jars to store to HCatalog in Pig?
Date Fri, 22 Jun 2012 22:14:12 GMT
SUCCESS!  Thanks, everyone :)  I'll repay you in good docs.

2012-06-22 21:56:34,124 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 20% complete
2012-06-22 21:57:09,198 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 22% complete
2012-06-22 22:02:59,547 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 40% complete
2012-06-22 22:10:09,594 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2012-06-22 22:10:09,596 [main] INFO
 org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features
0.20.205 0.10.0-SNAPSHOT hadoop 2012-06-22 21:47:39 2012-06-22 22:10:09
FILTER

Success!

Job Stats (time in seconds):
JobId Maps Reduces MaxMapTime MinMapTIme AvgMapTime MaxReduceTime
MinReduceTime AvgReduceTime Alias Feature Outputs
job_201206120007_0007 14 0 475 378 425 0 0 0 emails,pairs
MULTI_QUERY,MAP_ONLY /tmp/test,from_to_week,

Input(s):
Successfully read 246391 records (4886 bytes) from:
"s3://rjurney.public/enron.avro"

Output(s):
Successfully stored 1159680 records (82710669 bytes) in: "/tmp/test"
Successfully stored 1159680 records in: "from_to_week"

Counters:
Total records written : 2319360
Total bytes written : 82710669
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

On Fri, Jun 22, 2012 at 3:09 PM, Russell Jurney <russell.jurney@gmail.com>wrote:

> The job is now running, and did not fail.  This is good :)
>
> Does it take a long time to build a hive table?  The job is kinda slow.
>  But it is EMR ;)
>
> 2012-06-22 21:56:34,124 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 20% complete
> 2012-06-22 21:57:09,198 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 22% complete
> 2012-06-22 22:02:59,547 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 40% complete
>
>
> On Fri, Jun 22, 2012 at 2:31 PM, Travis Crawford <traviscrawford@gmail.com
> > wrote:
>
>> Huzzah! Nice find Daniel.
>>
>> --travis
>>
>>
>> On Fri, Jun 22, 2012 at 2:21 PM, Daniel Dai <daijy@hortonworks.com>
>> wrote:
>> > It seems we still need PIG_CLASSPATH if we use HCat on store side.
>> >
>> > I have a one line change to make it work:
>> >
>> > Index: src/org/apache/pig/impl/PigContext.java
>> > ===================================================================
>> > --- src/org/apache/pig/impl/PigContext.java    (revision 1351194)
>> > +++ src/org/apache/pig/impl/PigContext.java    (working copy)
>> > @@ -273,6 +273,7 @@
>> >          if (resource != null) {
>> >              extraJars.add(resource);
>> >              PigContext.classloader = createCl(null);
>> > +
>> > Thread.currentThread().setContextClassLoader(PigContext.classloader);
>> >          }
>> >      }
>> >
>> > Daniel
>> >
>> >
>> > On Fri, Jun 22, 2012 at 12:35 PM, Russell Jurney <
>> russell.jurney@gmail.com>
>> > wrote:
>> >>
>> >> https://gist.github.com/2970165 has updated information after my
>> >> environment is setup.
>> >>
>> >>
>> >> On Fri, Jun 22, 2012 at 12:20 PM, Daniel Dai <daijy@hortonworks.com>
>> >> wrote:
>> >>>
>> >>> If you can share your session, I'd happy to take a look.
>> >>>
>> >>> On Fri, Jun 22, 2012 at 12:14 PM, Russell Jurney
>> >>> <russell.jurney@gmail.com> wrote:
>> >>>>
>> >>>> Thanks for your help, https://gist.github.com/2970165 now contains
>> the
>> >>>> 'ls' result too.
>> >>>>
>> >>>>
>> >>>> On Fri, Jun 22, 2012 at 10:14 AM, Travis Crawford
>> >>>> <traviscrawford@gmail.com> wrote:
>> >>>>>
>> >>>>> I just hopped in #hcat on freenode too.
>> >>>>>
>> >>>>> --travis
>> >>>>>
>> >>>>>
>> >>>>> On Fri, Jun 22, 2012 at 10:12 AM, Travis Crawford
>> >>>>> <traviscrawford@gmail.com> wrote:
>> >>>>> > You shouldn't need to add jars on your classpath anymore;
I
>> allegedly
>> >>>>> > fixed this in PIG-2532.
>> >>>>> >
>> >>>>> > To double-check, what does "ls" of these globs show:
>> >>>>> >
>> >>>>> > register /usr/local/hcat/share/hcatalog/*.jar
>> >>>>> > register /home/hadoop/hive-0.9.0/lib/*.jar
>> >>>>> >
>> >>>>> > --travis
>> >>>>> >
>> >>>>> >
>> >>>>> > On Fri, Jun 22, 2012 at 9:38 AM, James Estes <
>> james.estes@gmail.com>
>> >>>>> > wrote:
>> >>>>> >> I'm pretty sure 'register'ing isn't what you want.
 I think you
>> need
>> >>>>> >> to
>> >>>>> >> specify the jars on the pig command line with:
>> >>>>> >> -Dpig.additional.jars=...
>> >>>>> >> see the "Running Pig with HCatalog" section here:
>> >>>>> >> http://incubator.apache.org/hcatalog/docs/r0.4.0/loadstore.html
>> >>>>> >>
>> >>>>> >> In short you need to set up a script that does the
following:
>> >>>>> >>
>> >>>>> >> export HADOOP_HOME=/usr/lib/hadoop
>> >>>>> >> export HCAT_HOME=/usr/local/hcat
>> >>>>> >> export HIVE_HOME=/usr/local/hive
>> >>>>> >> export
>> >>>>> >>
>> >>>>> >>
>> PIG_CLASSPATH=$HCAT_HOME/share/hcatalog/hcatalog-0.4.0.jar:$HIVE_HOME/lib/hive-metastore-0.9.0.jar:$HIVE_HOME/lib/libthrift-0.7.0.jar:$HIVE_HOME/lib/hive-exec-0.9.0.jar:$HIVE_HOME/lib/libfb303-0.7.0.jar:$HIVE_HOME/lib/jdo2-api-2.3-ec.jar:$HIVE_HOME/conf:$HADOOP_HOME/conf:$HIVE_HOME/lib/slf4j-api-1.6.1.jar
>> >>>>> >>
>> >>>>> >> export PIG_OPTS=-Dhive.metastore.uris=thrift://localhost:8787
>> >>>>> >>
>> >>>>> >> pig -Dpig.additional.jars=$PIG_CLASSPATH
>> >>>>> >>
>> >>>>> >> James
>> >>>>> >>
>> >>>>> >>
>> >>>>> >> On Thu, Jun 21, 2012 at 10:11 PM, Russell Jurney
>> >>>>> >> <russell.jurney@gmail.com>
>> >>>>> >> wrote:
>> >>>>> >>>
>> >>>>> >>> Now, I'm loading the required jars but not able
to store to
>> >>>>> >>> HCatalog.
>> >>>>> >>>
>> >>>>> >>> https://gist.github.com/2970165
>> >>>>> >>>
>> >>>>> >>> On Mon, Jun 11, 2012 at 6:36 PM, Russell Jurney
>> >>>>> >>> <russell.jurney@gmail.com>
>> >>>>> >>> wrote:
>> >>>>> >>>>
>> >>>>> >>>> Much thanks!
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> On Mon, Jun 11, 2012 at 6:30 PM, Aniket Mokashi
>> >>>>> >>>> <aniket486@gmail.com>
>> >>>>> >>>> wrote:
>> >>>>> >>>>>
>> >>>>> >>>>> Minimum set-
>> >>>>> >>>>> /share/hcatalog/hcatalog*.jar:
>> >>>>> >>>>> /share/hcatalog/lib/hive-metastore*.jar:
>> >>>>> >>>>> /share/hcatalog/lib/thrift-fb303-*.jar:
>> >>>>> >>>>> /share/hcatalog/lib/libthrift*.jar:
>> >>>>> >>>>> /share/hcatalog/lib/hive-exec-*.jar
>> >>>>> >>>>> +HCatalog configuration (hive-site et al)...
>> >>>>> >>>>>
>> >>>>> >>>>> These are all you need to make hive client
to work.
>> >>>>> >>>>>
>> >>>>> >>>>> ~Aniket
>> >>>>> >>>>>
>> >>>>> >>>>> On Mon, Jun 11, 2012 at 4:55 PM, Vandana
Ayyalasomayajula
>> >>>>> >>>>> <avandana@yahoo-inc.com> wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hi Russel,
>> >>>>> >>>>>>
>> >>>>> >>>>>> I have the following jars and  hive-site.xml
( containing URI
>> >>>>> >>>>>> info to
>> >>>>> >>>>>> connect to metastore) in the PIG_CLASSPATH
to use Pig with
>> >>>>> >>>>>> HCatalog.
>> >>>>> >>>>>>
>> >>>>> >>>>>> guava-r09.jar
>> >>>>> >>>>>> hbase-storage-handler-0.1.0.jar
>> >>>>> >>>>>> hcatalog-0.5.0-dev.jar
>> >>>>> >>>>>> hcatSupport-0.5.0-dev.jar
>> >>>>> >>>>>> hive-builtins-<version>.jar
>> >>>>> >>>>>> hive-cli-<version>.jar
>> >>>>> >>>>>> hive-common-<version>.jar
>> >>>>> >>>>>> hive-contrib-<version>.jar
>> >>>>> >>>>>> hive-exec-<version>.jar
>> >>>>> >>>>>> hive-hbase-handler-<version>.jar
>> >>>>> >>>>>> hive-jdbc-<version>.jar
>> >>>>> >>>>>> hive-metastore-<version>.jar
>> >>>>> >>>>>> hive-pdk-<version>.jar
>> >>>>> >>>>>> hive-serde-<version>.jar
>> >>>>> >>>>>> hive-service-<version>.jar
>> >>>>> >>>>>> hive-shims-<version>.jar
>> >>>>> >>>>>> libfb303-<version>.jar
>> >>>>> >>>>>>
>> >>>>> >>>>>> Hope this helps.
>> >>>>> >>>>>> Thanks
>> >>>>> >>>>>> Vandana
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> On Jun 11, 2012, at 3:46 PM, Russell
Jurney wrote:
>> >>>>> >>>>>>
>> >>>>> >>>>>> Which jars need to be loaded to store
to HCatalog from Pig?
>>  I'm
>> >>>>> >>>>>> reading lots of docs:
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> http://incubator.apache.org/hcatalog/docs/r0.2.0/loadstore.html
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> http://incubator.apache.org/hcatalog/docs/r0.4.0/api/org/apache/hcatalog/pig/package-summary.html
>> >>>>> >>>>>>
>> https://cwiki.apache.org/confluence/display/HCATALOG/How+To+Test
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> http://developer.yahoo.com/blogs/hadoop/posts/2011/04/hcatalog-tables-and-metadata-for-hadoop/
>> >>>>> >>>>>> http://docs.hortonworks.com/HCatalog_Documentation/index.pdf
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> http://incubator.apache.org/hcatalog/docs/r0.4.0/inputoutput.html
>> >>>>> >>>>>>
>> >>>>> >>>>>> But I can't find this, and there are
a LOT of jars in
>> HCatalog
>> >>>>> >>>>>> :)
>> >>>>> >>>>>>
>> >>>>> >>>>>> --
>> >>>>> >>>>>> Russell
>> >>>>> >>>>>>
>> >>>>> >>>>>> Jurney twitter.com/rjurney russell.jurney@gmail.com
>> datasyndrome.com
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>> Vandana Ayyalasomayajula
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>>
>> >>>>> >>>>> --
>> >>>>> >>>>> "...:::Aniket:::... Quetzalco@tl"
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>>
>> >>>>> >>>> --
>> >>>>> >>>> Russell
>> >>>>> >>>>
>> >>>>> >>>> Jurney twitter.com/rjurney russell.jurney@gmail.com
>> datasyndrome.com
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>>
>> >>>>> >>> --
>> >>>>> >>> Russell
>> >>>>> >>>
>> >>>>> >>> Jurney twitter.com/rjurney russell.jurney@gmail.com
>> datasyndrome.com
>> >>>>> >>
>> >>>>> >>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Russell
>> >>>> Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Russell
>> >> Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com
>> >
>> >
>>
>
>
>
> --
> Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.
> com
>



-- 
Russell Jurney twitter.com/rjurney russell.jurney@gmail.com datasyndrome.com

Mime
View raw message