drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Empty parquet file creation and use in Drill
Date Fri, 18 Mar 2016 06:32:38 GMT
The original motivation was to fix DRILL-3635
<https://issues.apache.org/jira/browse/DRILL-3635>: for some reason Drill
was (is ?) not able to read parquet files that only contain the metadata
without any data. To avoid this we decided not to generate such files in
the first place.

A user recently reported having issues reading such parquet files:
DRILL-4517 <https://issues.apache.org/jira/browse/DRILL-4517>

On Fri, Mar 18, 2016 at 7:22 AM, Khurram Faraaz <kfaraaz@maprtech.com>
wrote:

> I am trying to understand the motivation to not write anything to disk when
> we do a CTAS with a LIMIT 0 query.
>
> And how will Drill treat an empty parquet file (one that has just the
> metadata and no actual data in it) that was generated by some external tool
> ? Today we return an Exception for such a case, in the future do we plan to
> handle this case and return (0 records or say no rows found) ?
>
> Thanks,
> Khurram
>
> On Fri, Mar 18, 2016 at 10:38 AM, Abdel Hakim Deneche <
> adeneche@maprtech.com
> > wrote:
>
> > Drill used to create such empty parquet files, but we would get an
> > exception when we try to query them (DRILL-3635
> > <https://issues.apache.org/jira/browse/DRILL-3635>).
> >
> > Drill's current behavior when you do a CTAS with a LIMIT 0 query is to
> not
> > write anything to disk, then when you try to query the table you would
> get
> > a "table not found" error message.
> >
> > On Fri, Mar 18, 2016 at 2:53 AM, Khurram Faraaz <kfaraaz@maprtech.com>
> > wrote:
> >
> > > Hello All,
> > >
> > > Currently in Drill 1.7.0 we do not support,
> > >   - the creation of empty parquet files.
> > >   - and we see an Exception when empty parquet file is queried using
> > Drill
> > >
> > > Should we support the creation of empty parquet file that just has the
> > > metadata information in the parquet footer and no actual data (consider
> > the
> > > CTAS case that uses a LIMIT 0 query)
> > >
> > > Once such an empty parquet file is created, Drill should also be able
> to
> > > query that parquet file, and report to user that there are no rows to
> > > return.
> > >
> > > Thanks,
> > > Khurram
> > >
> >
> >
> >
> > --
> >
> > Abdelhakim Deneche
> >
> > Software Engineer
> >
> >   <http://www.mapr.com/>
> >
> >
> > Now Available - Free Hadoop On-Demand Training
> > <
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message