pig-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jae Lee <Jae....@forward.co.uk>
Subject Re: Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...
Date Wed, 08 Dec 2010 18:33:58 GMT
oh yes it will definitely work... it's just that I don't want to write java wrapper around
PigServer. I would rather want a solution that works with plain vanilla pig installation...

On 8 Dec 2010, at 16:41, Ashutosh Chauhan wrote:

> You didn't mention why PigServer.openIterator() won't work for you.
> One of its usecase is what you are describing. It will avoid the need
> of writing ruby wrapper.
> Ashutosh
> On Tue, Dec 7, 2010 at 10:26, Jae Lee <Jae.Lee@forward.co.uk> wrote:
>> yeah I came across the openIterator(alias) on PigServer.
>> basically that's what I like to get (dump of the alias and nothing else) when I execute
pig script.
>> I'm currently writing a ruby wrapper that will use STORE the alias into temporary
location in hdfs then do Hadoop file fetch
>> any better idea?
>> J
>> On 7 Dec 2010, at 18:16, Ashutosh Chauhan wrote:
>>> I am not sure if I understood your requirements clearly, but if you
>>> are not looking for a pure PigLatin solution and can work through
>>> Pig's java api, then you may want to look at PigServer.
>>> http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html
>>> Something along the following lines:
>>> PigServer pig = new PigServer(pc, true);
>>> pig.registerQuery("A = load 'mydata'; ");
>>> pig.registerQuery("B = filter A by $0 > 10;");
>>> Iterator<Tuple> itr = pig.operIterator("B");
>>> while(itr.hasNext()){
>>>  if ( itr.next().get(0) == 25 ) {
>>>    // trigger further processing.
>>>  }
>>> }
>>> Its obviously not directly useful, but conveys the general idea. Hope it helps.
>>> Ashutosh
>>> On Tue, Dec 7, 2010 at 06:40, Jae Lee <Jae.Lee@forward.co.uk> wrote:
>>>> Hi,
>>>> In our application Hive is used as a database. i.e. a result set from a select
query is consumed outside of hadoop cluster.
>>>> The consumption process is not Hadoop friendly as in it is network bound
not cpu/disk bound.
>>>> I'm in a process of converting hive query into pig query to see if it reads
>>>> What I'm stuck at is finding the content of a specific alias dump, from all
the other stuff being logged, to be able to trigger further process.
>>>> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a
process, it's just that it seems not suitable for the kind of process we are looking at, because
the <cmd> gets run in hadoop cluster.
>>>> any thought?
>>>> J

View raw message