drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Using Apache Drill - Sample Data and Sample Queries - Examples
Date Thu, 14 Aug 2014 21:25:28 GMT
It works, thanks for sharing!


On Thu, Aug 14, 2014 at 2:21 PM, Jim Scott <jscott@maprtech.com> wrote:

> That is correct. It looks like vimeo just finished the conversion.
>
> Try the link again.
>
>
> On Thu, Aug 14, 2014 at 3:02 PM, Jason Altekruse <altekrusejason@gmail.com
> >
> wrote:
>
> > Interestingly it appears to be re-directing to a URL that looks like it
> has
> > a video ID on it. I haven't used vimeo, but is it possible it was
> uploaded
> > but not yet made public?
> >
> > -Jason
> >
> >
> > On Thu, Aug 14, 2014 at 12:57 PM, Timothy Chen <tnachen@gmail.com>
> wrote:
> >
> > > Hi Jim,
> > >
> > > The vimeo link gave me 404 though.
> > >
> > > Tim
> > >
> > > On Thu, Aug 14, 2014 at 12:31 PM, Jim Scott <jscott@maprtech.com>
> wrote:
> > > > I recently gave a presentation on how to use Apache Drill with some
> > > > demonstrations.
> > > >
> > > > The video for the presentation is available here:
> > > > http://vimeo.com/chug/using-apache-drill
> > > > The slides are available here:
> > > >
> > >
> >
> http://www.slideshare.net/ChicagoHUG/using-apache-drill-chug-august-2014-jim-scott
> > > >
> > > > *The data used in this presentation can be found and downloaded
> (1.3GB
> > > CSV
> > > > file and a 100KB CSV file) *
> > > > *here:*
> > > >
> > >
> >
> https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
> > > > https://data.cityofchicago.org/Public-Safety/Sex-Offenders/vc9r-bqvy
> > > >
> > > > Once you start your drill session you can connect to it via:
> > > > http://localhost:8047/
> > > >
> > > >
> > > > *Queries (pay attention to back ticks) *
> > > > To see a full stack trace when errors occur:
> > > > alter session set `exec.errors.verbose`=true;
> > > >
> > > >
> > > > *Looking through the information schema:*SELECT * from
> > > > INFORMATION_SCHEMA.SCHEMATA;
> > > > SELECT * from INFORMATION_SCHEMA.`TABLES`;
> > > > SELECT * from INFORMATION_SCHEMA.`COLUMNS`;
> > > > SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE
> > > > TABLE_SCHEMA = 'sys' and TABLE_NAME = 'options';
> > > >
> > > >
> > > > *How many of each type of incident:*select count(*) as incidents,
> > > > columns[5] as category from
> > > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv`
> group
> > > by
> > > > columns[5] order by incidents desc limit 10;
> > > >
> > > >
> > > > *How many of each type of incident and subtype:*select count(*) as
> > > > incidents, columns[5] as type, columns[6] as subtype from
> > > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv`
> group
> > > by
> > > > columns[6], columns[5] order by incidents desc limit 10;
> > > >
> > > >
> > > > *CREATE A TABLE in PARQUET format:*
> > > >
> > > > *Please note that I created a workspace "db" at my path of
> > /opt/drill/db
> > > > and I gave it a default storage format of parquet. *
> > > >
> > > > create table dfs.db.`crime_data` as select columns[0] as REFID,
> > > columns[1]
> > > > as CaseNumber, columns[2] as DateTime, columns[3] as Block,
> columns[4]
> > as
> > > > IUCR, columns[5] as PrimaryType, columns[6] as Description,
> columns[7]
> > as
> > > > LocationDescription, columns[8] as Arrest, columns[9] as Domestic,
> > > > columns[10] as Beat, columns[11] as District, columns[12] as Ward,
> > > > columns[13] as CommunityArea, columns[14] as FBICode, columns[15] as
> > > > XCoordinate, columns[16] as YCoordinate, columns[17] as YEAR_DATE,
> > > > columns[18] as UpdatedOn, columns[19] as Latitude, columns[20] as
> > > > Longitude, columns[21] as Location from
> > > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv`;
> > > > create table dfs.db.`sex_offenders` as select columns[0] as
> > > LAST,columns[1]
> > > > as FIRST,columns[2] as BLOCK,columns[3] as GENDER,columns[4] as
> > > > RACE,columns[5] as BIRTHDATE,columns[6] as AGE,columns[7] as
> > > > HEIGHT,columns[8] as WEIGHT,columns[9] as VICTIM_MINOR from
> > > > dfs.`/home/jscott/Downloads/chicago/Sex_Offenders.csv`;
> > > >
> > > >
> > > > *Run the previous query on the newly created table:*select count(*)
> as
> > > > incidents, primarytype, description from dfs.db.`crime_data` group by
> > > > description, primarytype order by incidents desc limit 10;
> > > >
> > > >
> > > > *How many crimes occurred on a block with a registered sex
> > > offender?*SELECT
> > > > count(*) as incidents FROM dfs.db.`crime_data` crimes JOIN
> > > > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block
> > ORDER
> > > BY
> > > > incidents DESC;
> > > >
> > > >
> > > > *What types of incidents occurred?*SELECT count(*) as incidents,
> > > > crimes.primarytype as type, crimes.description as subtype FROM
> > > > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON
> > > > crimes.block = offenders.block GROUP BY crimes.description,
> > > > crimes.primarytype ORDER BY incidents DESC LIMIT 20;
> > > >
> > > >
> > > > *What all is in the BATTERY category?*SELECT count(*) as incidents,
> > > > crimes.description as subtype FROM dfs.db.`crime_data` crimes JOIN
> > > > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block
> > WHERE
> > > > crimes.primarytype = 'BATTERY' GROUP BY crimes.description ORDER BY
> > > > incidents DESC;
> > > >
> > > >
> > > > *How many total BATTERY incidents?*SELECT count(*) as incidents FROM
> > > > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON
> > > > crimes.block = offenders.block WHERE crimes.primarytype = 'BATTERY';
> > > >
> > > >
> > > > *How many "BATTERY" incidents occurred on those blocks by year (has
> it
> > > > gotten better or worse over time)?*SELECT count(*) as incidents,
> > > > extract(year from to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a'))
> as
> > > > `year` FROM dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders`
> > > > offenders ON crimes.block = offenders.block WHERE crimes.primarytype
> =
> > > > 'BATTERY' and crimes.datetime not like '%Date%' GROUP BY extract(year
> > > from
> > > > to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) ORDER BY `year`
> ASC;
> > > >
> > > >
> > > > Do you have queries against this data set that you have written that
> > you
> > > > would like to share with others? Reply to the thread...
> > > >
> > > >
> > > > --
> > > > *Jim Scott*
> > > > Director, Enterprise Strategy & Architecture
> > > >
> > > >  <http://www.mapr.com/>
> > > > [image: MapR Technologies] <http://www.mapr.com>
> > >
> >
>
>
>
> --
> *Jim Scott*
> Director, Enterprise Strategy & Architecture
> +1 (347) 746-9281
>
>  <http://www.mapr.com/>
> [image: MapR Technologies] <http://www.mapr.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message