crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Attila Sasvari (JIRA)" <>
Subject [jira] [Commented] (CRUNCH-636) Make replication factor for temporary files configurable
Date Thu, 23 Mar 2017 20:03:41 GMT


Attila Sasvari commented on CRUNCH-636:

[~joshwills] One of the new unittests ([initialReplicationFactorUsedFromFileSystem() |]
) fails on CentOS 6.4 with java version "1.8.0_121" if you run all the unittests together
with {{mvn test}}. The issue is that if you create a new Hadoop {{Configuration()}}, it checks
for default files to load by default (see [contructor |]).
Some tests earlier in crunch-core leaves something behind which confuses {{initialReplicationFactorUsedFromFileSystem()}}.
Please note I could not see/reproduce this test failure on Mac OSX.

Does Crunch use amendment patches? Is it planned to use jenkins to catch issues like this?

> Make replication factor for temporary files configurable
> --------------------------------------------------------
>                 Key: CRUNCH-636
>                 URL:
>             Project: Crunch
>          Issue Type: New Feature
>            Reporter: Attila Sasvari
>            Assignee: Attila Sasvari
>             Fix For: 1.0.0
>         Attachments: CRUNCH-636.01.patch, CRUNCH-636.02.patch, CRUNCH-636.03.patch, CRUNCH-636.04.patch,, test.WordCount_2017-03-08_16.31.55.737.log
> As of now, Crunch does not allow having different replication factor for temporary files
and non-temporary files (e.g. final output data of leaf nodes) at the same time. If a user
has a large amount of data (say hundreds a of gigabytes) to process, they might want to have
lower replication factor for large temporary files between Crunch jobs. 
> We could make this configurable via a new setting (e.g. {{crunch.tmp.dir.replication}}).

This message was sent by Atlassian JIRA

View raw message