giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianlong Zhong (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-1161) implement random sampling for input splits
Date Thu, 28 Sep 2017 17:49:00 GMT
Jianlong Zhong created GIRAPH-1161:
--------------------------------------

             Summary: implement random sampling for input splits
                 Key: GIRAPH-1161
                 URL: https://issues.apache.org/jira/browse/GIRAPH-1161
             Project: Giraph
          Issue Type: Improvement
            Reporter: Jianlong Zhong
            Priority: Minor


Currently if we are reading vertex/edge data from multiple tables, and we only want to read
a fraction of data (with giraph.inputSplitSamplePercent conf option), we'll always get the
first inputSplitSamplePercent of the input slits. We should instead use a random sample of
input splits so testing on sample of data would look closer to actual full data run.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message