drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <ste...@dremio.com>
Subject Re: [DISCUSS] Ideas to improve metadata cache read performance
Date Tue, 27 Oct 2015 19:04:59 GMT
I think we need to come up with a way to push partition pruning to
execution time.  The other solutions may relive the problem in some cases,
but won't solve the fundamental problem.

For example, even if we do figure out how to use multiple threads for
reading the metadata, that may be fine for a couple hundred thousand files,
but what about when we have millions or tens of millions of files. It will
still be a huge bottle neck.

I actually think we should use the Drill execution engine to probe the
metadata and generate the work assignments. We could have an additional
fragment or fragments of the query that would recursively probe the
filesystem, read the metadata, and make assignments, and then pipe the
results into the Scanners, which will create readers on the fly. This way
the query could actually begin doing work before the metadata has even been
fully read.

On Mon, Oct 26, 2015 at 2:42 PM, Jacques Nadeau <jacques@dremio.com> wrote:

> My first thought is we've gotten too generous in what we're storing in the
> Parquet metadata file. Early implementations were very lean and it seems
> far larger today. For example, early implementations didn't keep statistics
> and ignored row groups (files, schema and block locations only). If we need
> multiple levels of information, we may want to stagger (or normalize) them
> in the file. Also, we may think about what is the minimum that must be done
> in planning. We could do the file pruning at execution time rather than
> single-tracking these things (makes stats harder though).
>
> I also think we should be cautious around jumping to a conclusion until
> DRILL-3973 provides more insight.
>
> In terms of caching, I'd be more inclined to rely on file system caching
> and make sure serialization/deserialization is as efficient as possible as
> opposed to implementing an application-level cache. (We already have enough
> problems managing memory without having to figure out when we should drop a
> metadata cache :D).
>
> Aside, I always liked this post for entertainment and the thoughts on
> virtual memory: https://www.varnish-cache.org/trac/wiki/ArchitectNotes
>
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Oct 26, 2015 at 2:25 PM, Hanifi Gunes <hgunes@maprtech.com> wrote:
>
> > One more thing, for workloads running queries over subsets of same
> parquet
> > files, we can consider maintaining an in-memory cache as well. Assuming
> > metadata memory footprint per file is low and parquet files are static,
> not
> > needing us to invalidate the cache often.
> >
> > H+
> >
> > On Mon, Oct 26, 2015 at 2:10 PM, Hanifi Gunes <hgunes@maprtech.com>
> wrote:
> >
> > > I am not familiar with the contents of metadata stored but if
> > > deserialization workload seems to be fitting to any of afterburner's
> > > claimed improvement points [1] It could well be worth trying given the
> > > claimed gain on throughput is substantial.
> > >
> > > It could also be a good idea to partition caching over a number of
> files
> > > for better parallelization given number of cache files generated is
> > > *significantly* less than number of parquet files. Maintaining global
> > > statistics seems an improvement point too.
> > >
> > >
> > > -H+
> > >
> > > 1:
> > >
> >
> https://github.com/FasterXML/jackson-module-afterburner#what-is-optimized
> > >
> > > On Sun, Oct 25, 2015 at 9:33 AM, Aman Sinha <amansinha@apache.org>
> > wrote:
> > >
> > >> Forgot to include the link for Jackson's AfterBurner module:
> > >>   https://github.com/FasterXML/jackson-module-afterburner
> > >>
> > >> On Sun, Oct 25, 2015 at 9:28 AM, Aman Sinha <amansinha@apache.org>
> > wrote:
> > >>
> > >> > I was going to file an enhancement JIRA but thought I will discuss
> > here
> > >> > first:
> > >> >
> > >> > The parquet metadata cache file is a JSON file that contains a
> subset
> > of
> > >> > the metadata extracted from the parquet files.  The cache file can
> get
> > >> > really large .. a few GBs for a few hundred thousand files.
> > >> > I have filed a separate JIRA: DRILL-3973 for profiling the various
> > >> aspects
> > >> > of planning including metadata operations.  In the meantime, the
> > >> timestamps
> > >> > in the drillbit.log output indicate a large chunk of time spent in
> > >> creating
> > >> > the drill table to begin with, which indicates bottleneck in reading
> > the
> > >> > metadata.  (I can provide performance numbers later once we confirm
> > >> through
> > >> > profiling).
> > >> >
> > >> > A few thoughts around improvements:
> > >> >  - The jackson deserialization of the JSON file is very slow.. can
> > this
> > >> be
> > >> > speeded up ? .. for instance the AfterBurner module of jackson
> claims
> > to
> > >> > improve performance by 30-40% by avoiding the use of reflection.
> > >> >  - The cache file read is a single threaded process.  If we were
> > >> directly
> > >> > reading from parquet files, we use a default of 16 threads.  What
> can
> > be
> > >> > done to parallelize the read ?
> > >> >  - Any operation that can be done one time during the REFRESH
> METADATA
> > >> > command ?  for instance..examining the min/max values to determine
> > >> > single-value for partition column could be eliminated if we do this
> > >> > computation during REFRESH METADATA command and store the summary
> one
> > >> time.
> > >> >
> > >> >  - A pertinent question is: should the cache file be stored in a
> more
> > >> > efficient format such as Parquet instead of JSON ?
> > >> >
> > >> > Aman
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message