pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ashutosh Chauhan <hashut...@apache.org>
Subject Re: Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...
Date Wed, 08 Dec 2010 16:41:28 GMT
You didn't mention why PigServer.openIterator() won't work for you.
One of its usecase is what you are describing. It will avoid the need
of writing ruby wrapper.

On Tue, Dec 7, 2010 at 10:26, Jae Lee <Jae.Lee@forward.co.uk> wrote:
> yeah I came across the openIterator(alias) on PigServer.
> basically that's what I like to get (dump of the alias and nothing else) when I execute
pig script.
> I'm currently writing a ruby wrapper that will use STORE the alias into temporary location
in hdfs then do Hadoop file fetch
> any better idea?
> J
> On 7 Dec 2010, at 18:16, Ashutosh Chauhan wrote:
>> I am not sure if I understood your requirements clearly, but if you
>> are not looking for a pure PigLatin solution and can work through
>> Pig's java api, then you may want to look at PigServer.
>> http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html
>> Something along the following lines:
>> PigServer pig = new PigServer(pc, true);
>> pig.registerQuery("A = load 'mydata'; ");
>> pig.registerQuery("B = filter A by $0 > 10;");
>> Iterator<Tuple> itr = pig.operIterator("B");
>> while(itr.hasNext()){
>>  if ( itr.next().get(0) == 25 ) {
>>    // trigger further processing.
>>  }
>> }
>> Its obviously not directly useful, but conveys the general idea. Hope it helps.
>> Ashutosh
>> On Tue, Dec 7, 2010 at 06:40, Jae Lee <Jae.Lee@forward.co.uk> wrote:
>>> Hi,
>>> In our application Hive is used as a database. i.e. a result set from a select
query is consumed outside of hadoop cluster.
>>> The consumption process is not Hadoop friendly as in it is network bound not
cpu/disk bound.
>>> I'm in a process of converting hive query into pig query to see if it reads better.
>>> What I'm stuck at is finding the content of a specific alias dump, from all the
other stuff being logged, to be able to trigger further process.
>>> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process,
it's just that it seems not suitable for the kind of process we are looking at, because the
<cmd> gets run in hadoop cluster.
>>> any thought?
>>> J

View raw message