drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: Source code for forked parquet library.
Date Tue, 01 Sep 2015 04:22:05 GMT
I heard that there are some issues between filter push-down and parquet
metadata caching thing. But I'm not clear what exactly the problem is, and
whether we have a plan to resolve that. Can you elaborate what the open
questions
are and the conflicts with metadata caching?

The reason I'm trying to look at the filer pushdown is that one query
posted
in the user list couple of days ago performed really bad on Drill 1.1,
compared with
other system. We did some comparison analysis and thought the difference
mainly comes from the fact that Drill lacks the parquet filter pushdown
capability.
At least for now, the only way for Drill to match the other system's
performance
is to enable filter pushdown for that query.

In the meantime, we also identified some room for improvement in Drill's
run-time
generated code, when it is used for filter evaluation. I'll submit a patch
for review
shortly.

Regards,

Jinfeng







On Mon, Aug 31, 2015 at 8:13 PM, Jacques Nadeau <jacques@dremio.com> wrote:

> Given that Julien and Jason are working heavily on a merge into Parquet, I
> strongly suggest waiting on merging other patches around that code (or at
> least working on top of the changes they are doing.
>
> I thought there were a number of open questions around the filter pushdown
> and how it related to the metadata caching stuff. Have those been resolved?
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Aug 31, 2015 at 3:25 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>
> > I'm actually trying Adam's parquet filter pushdown patch (DRILL-1950).
> > That's
> > why I happened to click one parquet class and hit the above "source code
> > not found" error.
> >
> > Thanks!
> >
> >
> >
> > On Mon, Aug 31, 2015 at 3:20 PM, Jason Altekruse <
> altekrusejason@gmail.com
> > >
> > wrote:
> >
> > > https://github.com/mapr/incubator-parquet-mr/tree/1.6.0rc3-drill-r0.3
> > >
> > > I am working with Julien Le Dem on getting us off of the fork, but for
> > now
> > > the source code is accessible here. Let me know if you need any help
> > > looking through the parquet code. Is there a particular JIRA you are
> > trying
> > > to address?
> > >
> > > On Mon, Aug 31, 2015 at 3:15 PM, Jinfeng Ni <jinfengni99@gmail.com>
> > wrote:
> > >
> > > > It seems we are using a forked parquet library. Can someone point me
> > > > to the source code for the forked parquet ?
> > > >
> > > > I tried to download the source code within IDE, and it complains the
> > > > following:
> > > >
> > > > "*Cannot download sources*
> > > >
> > > > Sources not found for:
> > > > com.twitter:parquet-column:1.6.0rc3-drill-r0.3
> > > >
> > > > "
> > > >
> > > > So, looks like only the compiled code jar is published, but not the
> > > source
> > > > code jar file.
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message