spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Cutler (JIRA)" <>
Subject [jira] [Commented] (SPARK-25344) Break large files into smaller files
Date Fri, 14 Sep 2018 00:10:00 GMT


Bryan Cutler commented on SPARK-25344:

>From the mailing list I think we should agree on a few things first:

1. When to create a separate test file, for each module? and how to name? e.g. ""
2. Where to put the test files? same dir as source or subdir named "tests"
3. Start splitting tests immediately as new tests are written? Incrementally as subtasks in
this JIRA?

> Break large files into smaller files
> ---------------------------------------------
>                 Key: SPARK-25344
>                 URL:
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Imran Rashid
>            Priority: Major
> We've got a ton of tests in one humongous file, rather than breaking it out
into smaller files.
> Having one huge file doesn't seem great for code organization, and it also makes the
test parallelization in not work as well.  On my laptop, takes 150s,
and the next longest test file takes only 20s.  There are similarly large files in other pyspark
modules, eg. sql/, ml/, mllib/, streaming/
> It seems that at least for some of these files, its already broken into independent test
classes, so it shouldn't be too hard to just move them into their own files.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message