pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Dai <da...@hortonworks.com>
Subject Re: Create a constant relation in a Pig script
Date Wed, 06 Aug 2014 05:17:49 GMT
There are quite a few tests are using Mock.Storage, you can take them
as reference. Eg:
http://svn.apache.org/viewvc/pig/trunk/test/org/apache/pig/test/TestOrderBy3.java?view=markup

On Tue, Aug 5, 2014 at 1:08 AM, Alfonso Nishikawa
<alfonso.nishikawa@gmail.com> wrote:
> Hi, Daniel.
>
> Thank you!
> I get the following exception in the tasktracker nodes:
>
> java.io.IOException: no Data anymore for this Script. Has data been reset
> by another Storage.resetData(pigServer.getPigContext()) ?
>     at org.apache.pig.builtin.mock.Storage.getData(Storage.java:188)
>     at org.apache.pig.builtin.mock.Storage.init(Storage.java:362)
>     at org.apache.pig.builtin.mock.Storage.setLocation(Storage.java:376)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.mergeSplitSpecificConf(PigInputFormat.java:138)
>     at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:112)
>     at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:489)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:731)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:416)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> Am I wrong guessing mock.Storage only works with a PigServer of type
> ExecType.LOCAL and not ExecType.MAPREDUCE?
>
> I am testing some integration stuff with a pseudo-cluster.
>
> As we talk about mock.Storage, just comment a bug-feature I found: if I
> specify a location name with underscores ("delete_keys" for example"), the
> execution throws an exception at frontent:
>
> java.net.UnknownHostException: delete_keys is not a valid Inet address
>     at org.apache.hadoop.net.NetUtils.verifyHostnames(NetUtils.java:569)
>     at
> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:711)
>     at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4207)
>     at
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:701)
>
> of course, with a location called "keys" seems to work fine :)
>
> Updating the list of solutions, should add solution 5: update mock.Storage
> so it works in a PigServer of type ExecType.MAPREDUCE (if only runs in
> local).
>
> Alfonso Nishikawa
>
>
>
>
> 2014-08-05 2:27 GMT+02:00 Daniel Dai <daijy@hortonworks.com>:
>
>> Does org.apache.pig.builtin.mock.Storage work for you?
>>
>> Thanks,
>> Daniel
>>
>> On Mon, Aug 4, 2014 at 4:53 AM, Alfonso Nishikawa
>> <alfonso.nishikawa@gmail.com> wrote:
>> > Greetings.
>> >
>> > For testing purposes I need to create a relation in a Pig script.
>> > I found 2 questions about this in StackOverflow:
>> >
>> >
>> http://stackoverflow.com/questions/13414172/how-to-create-a-small-constant-relationtable-in-pig
>> >
>> http://stackoverflow.com/questions/12423399/define-tuple-datas-in-the-pig-script
>> >
>> > Is there anything about this?
>> >
>> > Resuming, there are 4 possible solutions:
>> >
>> > - Load from a file with the data
>> > - Load an empty file and set data with foreach-generate
>> > - Load from a special loader (ConstantLoader?)
>> > - Add to Pig Latin the construction "A = value".
>> >
>> > I find the solution 1 right, but much hassle to load only 1 tuple.
>> > I find the solution 2 quite dirty.
>> > I find the solution 3 factible.
>> > I find the solution 4 factible but more work than solution 3.
>> >
>> > Another point is that solution 3 is better than solution 1 when it is
>> only
>> > for 1 or 2 tuples because that tuple can be created based on parameteres
>> to
>> > the script.
>> >
>> > I did not find an issue related in jira. Does anyone know about?
>> >
>> > I can develop solution 3, but before creating any issue I think the best
>> > thing is to ask here :)
>> >
>> > Thanks,
>> >
>> > Alfonso Nishikawa
>>
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Mime
View raw message