spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <>
Subject Re: Documentation on "Automatic file coalescing for native data sources"?
Date Fri, 19 May 2017 23:15:59 GMT
I think like all other read operations, it is driven by input format used,
and I think some variation of combine file input format is used by default.
I think you can test it by force a particular input format which gets ine
file per split, then you should end up with same number of partitions as
your dsta files

On Sat, 20 May 2017 at 5:12 am, Aakash Basu <>

> Hey all,
> A reply on this would be great!
> Thanks,
> A.B.
> On 17-May-2017 1:43 AM, "Daniel Siegmann" <>
> wrote:
>> When using on a large number of small files, these are
>> automatically coalesced into fewer partitions. The only documentation I can
>> find on this is in the Spark 2.0.0 release notes, where it simply says (
>> "Automatic file coalescing for native data sources"
>> Can anyone point me to documentation explaining what triggers this
>> feature, how it decides how many partitions to coalesce to, and what counts
>> as a "native data source"? I couldn't find any mention of this feature in
>> the SQL Programming Guide and Google was not helpful.
>> --
>> Daniel Siegmann
>> Senior Software Engineer
>> *SecurityScorecard Inc.*
>> 214 W 29th Street, 5th Floor
>> New York, NY 10001
>> --
Best Regards,
Ayan Guha

View raw message