datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sam Steingold (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-45) RFE: CartesianProduct
Date Tue, 29 Apr 2014 23:14:15 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984938#comment-13984938
] 

Sam Steingold commented on DATAFU-45:
-------------------------------------

I tried that and got an error:
{code}
my_stage1 = foreach my_in {
  keywords = TOKENIZE(many,' ');
  weight = 1.0/(double)SIZE(keywords);
  generate id, keywords.token as keywords, weight as weight;
};
describe my_stage1;
-- my_stage1: {id: chararray,keywords: {(token: chararray)},weight: double}
dump my_stage1;

(1,{(k),(l),(m)},0.3333333333333333)
(3,{(i),(j)},0.5)
(1,{(i),(k)},0.5)
(3,{(l),(i)},0.5)
(1,{(m)},1.0)
(3,{(m),(i),(k)},0.3333333333333333)
(2,{(l),(k),(i)},0.3333333333333333)
(3,{(j),(m)},0.5)
(2,{(k)},1.0)
(3,{(m),(k)},0.5)
(2,{(k),(l)},0.5)
(3,{(l),(m)},0.5)


my_stage2 = foreach my_stage1 {
  keywords = cross keywords, weight;
  generate id, keywords;
};
describe my_stage2;
-- my_stage2: {id: chararray,keywords: {(keywords::token: chararray,null::weight: double)}}
dump my_stage2;

ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias my_stage2.
Backend error : java.lang.Double cannot be cast to org.apache.pig.data.Tuple
{code}


> RFE: CartesianProduct
> ---------------------
>
>                 Key: DATAFU-45
>                 URL: https://issues.apache.org/jira/browse/DATAFU-45
>             Project: DataFu
>          Issue Type: New Feature
>            Reporter: Sam Steingold
>
> Given two bags, produce their [Cartesian product|http://en.wikipedia.org/wiki/Cartesian_product]:
> {code}
> B1: bag{T1}
> B2: bag{T2}
> CartesianProduct(B1,B2): bag{(T1,T2)}
> {code}
> Use case:
> {code}
> toks = TOKENIZE((charray)$0,',');
> kwds = CartesianProduct(toks, {1.0/(double)SIZE(toks)});
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message