drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rahul challapalli <challapallira...@gmail.com>
Subject Re: A possible regression 1.9 / 1.10 when querying Parquet with complex types /nested structures (Map)
Date Sun, 04 Jun 2017 06:54:01 GMT
Jira is always the preferrable approach. Thank You.

On Sat, Jun 3, 2017 at 1:38 PM, Stefán Baxter <stefan@activitystream.com>
wrote:

> Hi Rahul,
>
> Sure, but can I perhaps get the files to you directly?
>
> Regards,
>  -Stefán
>
> On Sat, Jun 3, 2017 at 8:13 PM, rahul challapalli <
> challapallirahul@gmail.com> wrote:
>
> > Can you please raise a jira and attach the required files? I can try to
> > reproduce it.
> >
> > Rahul
> >
> > On Jun 3, 2017 6:19 AM, "Stefán Baxter" <stefan@activitystream.com>
> wrote:
> >
> > > Hi,
> > >
> > > I have a sample data set (a few million records) that is saved to
> parquet
> > > in 2 ways. A simple file structure with primary types to store
> dimensions
> > > and metrics (String, Double) and a using nested maps (String,String and
> > > String,Double) respectively.
> > >
> > > Querying the data set with the simple types only:
> > >
> > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`,
> sum(metrics_price)
> > as
> > > price, sum(metrics_kwh) as kwh from
> > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > > group by roundTimeStamp(s.occurred_at,'PT1H')
> > >
> > >
> > > takes: *28.442 *sec. (dev. laptop x 1)
> > >
> > >
> > > Same query against the nested structure:
> > >
> > > select roundTimeStamp(s.occurred_at,'PT1H') as `at`,
> > sum(s.metrics.price)
> > > as price, sum(s.metricss.kwh) as kwh from
> > > dfs.asa.`/processed/etactica-dev-p1/entitysamples/metrics/D2017*` as s
> > > group by roundTimeStamp(s.occurred_at,'PT1H')
> > >
> > > takes: *719.810* sec.
> > >
> > > Event counting the number of records takes very, very long if there is
> a
> > > nested structure involved. (select count(*) from)
> > > It does not behave like this on our production servers (1.8) put I have
> > not
> > > run this particular test on them (their performance has never been an
> > > issue)
> > > I have these sample files available if anyone wishes to reproduces this
> > > consistently.
> > > Regards,
> > >  -Stefán
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message