pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohini Palaniswamy <rohini.adi...@gmail.com>
Subject Re: Review Request 55681: [PIG-4963] Add a Bloom join
Date Wed, 18 Jan 2017 22:53:49 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/55681/
-----------------------------------------------------------

(Updated Jan. 18, 2017, 10:53 p.m.)


Review request for pig, Daniel Dai and Adam Szita.


Changes
-------

Made changes to documentation to refer to builtin bloom udfs and reference to Bloom filter
wiki and online bloom filter calculators


Bugs: PIG-4963
    https://issues.apache.org/jira/browse/PIG-4963


Repository: pig


Description
-------

This patch adds a new type of join called bloom. It supports creating multiple bloom filters
partitioned by hashcode of key for parallelism. Two new operators and one Packager implementations
are added.
  POBuildBloomRearrageTez  - Builds the bloom filter for one of the relations of the join
on the map side or writes out the join keys based on the strategy
  BloomPackager - Used in the reducer to create or combine bloom filters and produces the
final bloom filters. 
  POBloomFilterRearrangeTez - Applies the bloom filters to other relations in the join and
filters out data.

More details in the documentation.


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/pig/trunk/src/docs/src/documentation/content/xdocs/basic.xml
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/docs/src/documentation/content/xdocs/perf.xml
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigConfiguration.java 1779329

  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigCombiner.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/Packager.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/TezDagBuilder.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezCompiler.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezEdgeDescriptor.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezOperator.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/TezPOPackageAnnotator.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/BloomPackager.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POBloomFilterRearrangeTez.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POBuildBloomRearrangeTez.java
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POLocalRearrangeTez.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/operator/POShuffleTezLoad.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/CombinerOptimizer.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/tez/plan/optimizer/SecondaryKeyOptimizerTez.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOJoin.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanBuilder.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/ScriptState.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/tez/TezScriptState.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/build.xml 1779329 
  http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/join.conf PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/multiquery.conf 1779329 
  http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/tests/orc.conf 1779329 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEmptyInputDir.java
1779329 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-1-KeyToReducer.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-1.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-2-KeyToReducer.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-2.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-3-KeyToReducer.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-3.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-4-KeyToReducer.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-4.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-5-KeyToReducer.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-5.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-6-KeyToReducer.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-6.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-7-KeyToReducer.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/tez/TEZC-BloomJoin-7.gld
PRE-CREATION 
  http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/tez/TestTezCompiler.java 1779329


Diff: https://reviews.apache.org/r/55681/diff/


Testing
-------

Unit and e2e tests added for many different scenarios.


Thanks,

Rohini Palaniswamy


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message