incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Parquet file partition size
Date Mon, 08 Sep 2014 17:50:50 GMT
Where are these variables best modified?




On Mon, Sep 8, 2014 at 8:40 AM, Jacques Nadeau <jacques@apache.org> wrote:

> Drill's default behavior is to use estimates to determine the number of
> files that will be written.  The equation is fairly complicated.  However,
> there are three key variables that will impact file splits.  These are:
>
> planner.slice_target: targeted number of records to allow within a single
> slice before increasing parallelization (defaults to 1mm in 0.4, 100k in
> 0.5)
> planner.width.max_per_node: maximum number of slices run per node (defaults
> to 0.7 * core count)
> store.parquet.block-size:   largest allowed row group when generating
> Parquet files.  (defaults to 512mb)
>
> If you are having more files than you would like, you can
> decrease planner.width.max_per_node to a smaller number.
>
> It's likely that Jim Scott's experience with a smaller number of files was
> due to running on a machine with a smaller number of cores or the optimizer
> estimating a smaller amount of data in the output.  The behavior is data
> and machine dependent.
>
> thanks,
> Jacques
>
>
> On Mon, Sep 8, 2014 at 8:32 AM, Jim Scott <jscott@maprtech.com> wrote:
>
> > I have created tables with Drill in parquet format and it created 2
> files.
> >
> >
> > On Fri, Sep 5, 2014 at 3:46 PM, Jim <jimfcarroll@gmail.com> wrote:
> >
> > >
> > > Actually, it looks like it always breaks it into 6 pieces by default.
> Is
> > > there a way to make the partition size fixed rather than the number of
> > > partitions?
> > >
> > >
> > > On 09/05/2014 04:40 PM, Jim wrote:
> > >
> > >> Hello all,
> > >>
> > >> I've been experimenting with drill to load data into Parquet files. I
> > >> noticed rather large variability in the size of each parquet chunk. Is
> > >> there a way to control this?
> > >>
> > >> The documentation seems a little sparse on configuring some of the
> finer
> > >> details. My apologies if I missed something obvious.
> > >>
> > >> Thanks
> > >> Jim
> > >>
> > >>
> > >
> >
> >
> > --
> > *Jim Scott*
> > Director, Enterprise Strategy & Architecture
> >
> >  <http://www.mapr.com/>
> > [image: MapR Technologies] <http://www.mapr.com>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message