giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Igor Kabiljo" <ikabi...@fb.com>
Subject Re: Review Request 22234: support for partitioned input in giraph
Date Thu, 05 Jun 2014 02:29:39 GMT


> On June 4, 2014, 10:53 p.m., Igor Kabiljo wrote:
> > giraph-core/src/main/java/org/apache/giraph/mapping/DefaultEmbeddedLongByteOps.java,
line 47
> > <https://reviews.apache.org/r/22234/diff/1/?file=603360#file603360line47>
> >
> >     it is not clear that unset can ever happen - all vertex IDs and edges are translated
(with target being -1 here)
> >     
> >     I assume you need to add 1, so that if ID is not in the map is treated the same
as if embedTargetInfo was never called (i.e. user creating vertices with specific IDs later)
> 
> Pavan Kumar Athivarapu wrote:
>     sorry, I mean not-set

yeah I understood that - but it is not clear that things can ever be not-set - all input is
going to be transformed (as oppose to just not being in the mapping, which will return -1,
which can be encoded and read as -1 without having +1)


> On June 4, 2014, 10:53 p.m., Igor Kabiljo wrote:
> > giraph-core/src/main/java/org/apache/giraph/mapping/AbstractLongByteOps.java, lines
39-47
> > <https://reviews.apache.org/r/22234/diff/1/?file=603359#file603359line39>
> >
> >     as is, you don't need to override these functions?
> >     
> >     Also, it might be better to add:
> >     byte getEmbededInfo(long id),
> >     and have implementation of getPartition.
> >     
> >     (you could also implement above two methods, through two abstract primitive
methods:
> >     long embedTargetInfo(long id);
> >     long removeTargetInfo(long id);
> 
> Pavan Kumar Athivarapu wrote:
>     If u see subclasses of AbstractLongByteOps -
>     DefaultEmbeddedLongByteOps - this implements both embedTargetInfo & removeTargetInfo,
as it needs them
>     but DefaultLongByteOps - need not, because it computes partition by looking up the
MappingStore everytime it is called
>     
>     sure byte getEmbededInfo(long id) was in the initial implementation I had, just merged
it into getPartition, since that is the only place which uses it
>     
>     so if u want it to return long, that becomes very restrictive, what about generic
vertexIds, which are not longs, the interface MappingStoreOps needs to account for all those
as well right?
>

Well, I would then put those unsupported implementations inside of DefaultLongByteOps, instead
of in AbstractLongByteOps.

Also, seems like it might be better to add MappingStoreOps function boolean hasEmbedding,
instead of having flags giraph.embedInfoVertexIds and giraph.edgeTranslationClass. (better
not to have redudnant flags)
I.e. if hasEmbedding of active MappingStoreOps returns true, embedding should be used throughout.



- Igor


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22234/#review44768
-----------------------------------------------------------


On June 4, 2014, 3:11 p.m., Pavan Kumar Athivarapu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22234/
> -----------------------------------------------------------
> 
> (Updated June 4, 2014, 3:11 p.m.)
> 
> 
> Review request for giraph, Avery Ching, Sergey Edunov, Igor Kabiljo, and Maja Kabiljo.
> 
> 
> Repository: giraph-git
> 
> 
> Description
> -------
> 
> There are many changes here:
> 
> A new input format - MappingInputFormat - and related dependencies has been defined
> New hive input format classes to read Mapping table + some examples which can be used
for running hellopagerank with modifications in test plan have been defined
> Changes to main giraph classes to read the mapping & use it for getting partition
info - in worker, master & partition sections
> New mapping section defined to define & declare MappingStore format & some sample
implementations
> The code can take 2 paths based on what the user wants
> 
> Embed info into vertexId by implementing proper contracts (TranslateEdge.interface +
2 methods in MappingStore)
> then use an EmbeddedGraphPartitioner to directly read worker info off of vertex ids [translate
once & user freely for the rest of app]
> 
> Always read from the MappingStore (this has more overhead probably because of cache misses
in processor) [never translate but pay cost of map lookup each time partition / worker info
is needed]
> 
> 
> Diffs
> -----
> 
>   giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java ec0ddbb 
>   giraph-core/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java bda967d

>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphClasses.java 3337621 
>   giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 6b36418 
>   giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java
95e029d 
>   giraph-core/src/main/java/org/apache/giraph/io/MappingInputFormat.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/io/MappingReader.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedMappingInputFormat.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/io/internal/WrappedMappingReader.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/io/iterables/MappingReaderWrapper.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/mapping/AbstractLongByteOps.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/DefaultEmbeddedLongByteOps.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/mapping/DefaultLongByteOps.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/LongByteMappingStore.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/MappingEntry.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/MappingStore.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/MappingStoreOps.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/package-info.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/translate/LongByteTranslateEdge.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/mapping/translate/TranslateEdge.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/mapping/translate/package-info.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java 90dc9f3 
>   giraph-core/src/main/java/org/apache/giraph/master/MasterThread.java 15dbe07 
>   giraph-core/src/main/java/org/apache/giraph/partition/DefaultLongBytePartitionerFactory.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/partition/GraphPartitionerFactory.java
4200d79 
>   giraph-core/src/main/java/org/apache/giraph/partition/HashPartitionerFactory.java 7cc5651

>   giraph-core/src/main/java/org/apache/giraph/partition/HashRangePartitionerFactory.java
1eeece7 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleIntRangePartitionerFactory.java
8ab692f 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleLongRangePartitionerFactory.java
2989598 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimplePartitionerFactory.java
15b0756 
>   giraph-core/src/main/java/org/apache/giraph/partition/SimpleWorkerPartitioner.java
600d7a3 
>   giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java aff7084 
>   giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java 828eac4

>   giraph-core/src/main/java/org/apache/giraph/worker/FullInputSplitCallable.java PRE-CREATION

>   giraph-core/src/main/java/org/apache/giraph/worker/LocalData.java PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/worker/MappingInputSplitsCallable.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/worker/MappingInputSplitsCallableFactory.java
PRE-CREATION 
>   giraph-core/src/main/java/org/apache/giraph/worker/VertexInputSplitsCallable.java e3e04d6

>   giraph-core/src/test/java/org/apache/giraph/partition/SimpleRangePartitionFactoryTest.java
4e19cd2 
>   giraph-hive/src/main/java/org/apache/giraph/hive/HiveGiraphRunner.java 603910b 
>   giraph-hive/src/main/java/org/apache/giraph/hive/common/GiraphHiveConstants.java c7ad63b

>   giraph-hive/src/main/java/org/apache/giraph/hive/common/HiveUtils.java 2388673 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/AbstractHiveToMapping.java
PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/HiveMappingInputFormat.java
PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/HiveMappingReader.java
PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/HiveToMapping.java PRE-CREATION

>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/SimpleHiveToMapping.java
PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/examples/LongByteHiveToMapping.java
PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/examples/LongInt2ByteHiveToMapping.java
PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/examples/package-info.java
PRE-CREATION 
>   giraph-hive/src/main/java/org/apache/giraph/hive/input/mapping/package-info.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/22234/diff/
> 
> 
> Testing
> -------
> 
> ran pagerank jobs multiple times
> mvn clean verify
> 
> 
> Thanks,
> 
> Pavan Kumar Athivarapu
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message