incubator-drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Scott <jsc...@maprtech.com>
Subject Re: Using Apache Drill - Sample Data and Sample Queries - Examples
Date Thu, 14 Aug 2014 21:21:47 GMT
That is correct. It looks like vimeo just finished the conversion.

Try the link again.


On Thu, Aug 14, 2014 at 3:02 PM, Jason Altekruse <altekrusejason@gmail.com>
wrote:

> Interestingly it appears to be re-directing to a URL that looks like it has
> a video ID on it. I haven't used vimeo, but is it possible it was uploaded
> but not yet made public?
>
> -Jason
>
>
> On Thu, Aug 14, 2014 at 12:57 PM, Timothy Chen <tnachen@gmail.com> wrote:
>
> > Hi Jim,
> >
> > The vimeo link gave me 404 though.
> >
> > Tim
> >
> > On Thu, Aug 14, 2014 at 12:31 PM, Jim Scott <jscott@maprtech.com> wrote:
> > > I recently gave a presentation on how to use Apache Drill with some
> > > demonstrations.
> > >
> > > The video for the presentation is available here:
> > > http://vimeo.com/chug/using-apache-drill
> > > The slides are available here:
> > >
> >
> http://www.slideshare.net/ChicagoHUG/using-apache-drill-chug-august-2014-jim-scott
> > >
> > > *The data used in this presentation can be found and downloaded (1.3GB
> > CSV
> > > file and a 100KB CSV file) *
> > > *here:*
> > >
> >
> https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2
> > > https://data.cityofchicago.org/Public-Safety/Sex-Offenders/vc9r-bqvy
> > >
> > > Once you start your drill session you can connect to it via:
> > > http://localhost:8047/
> > >
> > >
> > > *Queries (pay attention to back ticks) *
> > > To see a full stack trace when errors occur:
> > > alter session set `exec.errors.verbose`=true;
> > >
> > >
> > > *Looking through the information schema:*SELECT * from
> > > INFORMATION_SCHEMA.SCHEMATA;
> > > SELECT * from INFORMATION_SCHEMA.`TABLES`;
> > > SELECT * from INFORMATION_SCHEMA.`COLUMNS`;
> > > SELECT COLUMN_NAME, DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE
> > > TABLE_SCHEMA = 'sys' and TABLE_NAME = 'options';
> > >
> > >
> > > *How many of each type of incident:*select count(*) as incidents,
> > > columns[5] as category from
> > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group
> > by
> > > columns[5] order by incidents desc limit 10;
> > >
> > >
> > > *How many of each type of incident and subtype:*select count(*) as
> > > incidents, columns[5] as type, columns[6] as subtype from
> > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv` group
> > by
> > > columns[6], columns[5] order by incidents desc limit 10;
> > >
> > >
> > > *CREATE A TABLE in PARQUET format:*
> > >
> > > *Please note that I created a workspace "db" at my path of
> /opt/drill/db
> > > and I gave it a default storage format of parquet. *
> > >
> > > create table dfs.db.`crime_data` as select columns[0] as REFID,
> > columns[1]
> > > as CaseNumber, columns[2] as DateTime, columns[3] as Block, columns[4]
> as
> > > IUCR, columns[5] as PrimaryType, columns[6] as Description, columns[7]
> as
> > > LocationDescription, columns[8] as Arrest, columns[9] as Domestic,
> > > columns[10] as Beat, columns[11] as District, columns[12] as Ward,
> > > columns[13] as CommunityArea, columns[14] as FBICode, columns[15] as
> > > XCoordinate, columns[16] as YCoordinate, columns[17] as YEAR_DATE,
> > > columns[18] as UpdatedOn, columns[19] as Latitude, columns[20] as
> > > Longitude, columns[21] as Location from
> > > dfs.`/home/jscott/Downloads/chicago/Crimes_-_2001_to_present.csv`;
> > > create table dfs.db.`sex_offenders` as select columns[0] as
> > LAST,columns[1]
> > > as FIRST,columns[2] as BLOCK,columns[3] as GENDER,columns[4] as
> > > RACE,columns[5] as BIRTHDATE,columns[6] as AGE,columns[7] as
> > > HEIGHT,columns[8] as WEIGHT,columns[9] as VICTIM_MINOR from
> > > dfs.`/home/jscott/Downloads/chicago/Sex_Offenders.csv`;
> > >
> > >
> > > *Run the previous query on the newly created table:*select count(*) as
> > > incidents, primarytype, description from dfs.db.`crime_data` group by
> > > description, primarytype order by incidents desc limit 10;
> > >
> > >
> > > *How many crimes occurred on a block with a registered sex
> > offender?*SELECT
> > > count(*) as incidents FROM dfs.db.`crime_data` crimes JOIN
> > > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block
> ORDER
> > BY
> > > incidents DESC;
> > >
> > >
> > > *What types of incidents occurred?*SELECT count(*) as incidents,
> > > crimes.primarytype as type, crimes.description as subtype FROM
> > > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON
> > > crimes.block = offenders.block GROUP BY crimes.description,
> > > crimes.primarytype ORDER BY incidents DESC LIMIT 20;
> > >
> > >
> > > *What all is in the BATTERY category?*SELECT count(*) as incidents,
> > > crimes.description as subtype FROM dfs.db.`crime_data` crimes JOIN
> > > dfs.db.`sex_offenders` offenders ON crimes.block = offenders.block
> WHERE
> > > crimes.primarytype = 'BATTERY' GROUP BY crimes.description ORDER BY
> > > incidents DESC;
> > >
> > >
> > > *How many total BATTERY incidents?*SELECT count(*) as incidents FROM
> > > dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders` offenders ON
> > > crimes.block = offenders.block WHERE crimes.primarytype = 'BATTERY';
> > >
> > >
> > > *How many "BATTERY" incidents occurred on those blocks by year (has it
> > > gotten better or worse over time)?*SELECT count(*) as incidents,
> > > extract(year from to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) as
> > > `year` FROM dfs.db.`crime_data` crimes JOIN dfs.db.`sex_offenders`
> > > offenders ON crimes.block = offenders.block WHERE crimes.primarytype =
> > > 'BATTERY' and crimes.datetime not like '%Date%' GROUP BY extract(year
> > from
> > > to_date(crimes.datetime, 'MM/DD/YYYY hh:mm:ss a')) ORDER BY `year` ASC;
> > >
> > >
> > > Do you have queries against this data set that you have written that
> you
> > > would like to share with others? Reply to the thread...
> > >
> > >
> > > --
> > > *Jim Scott*
> > > Director, Enterprise Strategy & Architecture
> > >
> > >  <http://www.mapr.com/>
> > > [image: MapR Technologies] <http://www.mapr.com>
> >
>



-- 
*Jim Scott*
Director, Enterprise Strategy & Architecture
+1 (347) 746-9281

 <http://www.mapr.com/>
[image: MapR Technologies] <http://www.mapr.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message