hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <>
Subject RE: RegexSerDe with Filters
Date Tue, 21 Jun 2016 11:25:48 GMT

I would suggest creating a single external table with daily partitions and multiple views
each with the appropriate filtering.
If you’ll send me log sample (~100 rows) I’ll send you an example.


From: Arun Patel []
Sent: Tuesday, June 21, 2016 1:51 AM
Subject: RegexSerDe with Filters

Hello Hive Experts,

I use flume to ingest application specific logs from Syslog to HDFS.  Currently, I grep the
HDFS directory for specific patterns (for multiple types of requests) and then create reports.
 However, generating reports for Weekly and Monthly are not salable.

I would like to create multiple external on the daily HDFS directory partitioned by date with
RegexSerde and then create separate Parquet tables for every kind of request.

Question is - How do I create multiple (about 20) RegexSerde tables on same data applying
filters?  This will be just like 20 grep commands I am running today.

Example:  hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'STORE Request Received for APPXXXX'
| awk '{print $4, $13, $14, $17, $20}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'SCAN Request Received for
APPYYYY' | awk '{print $4, $14, $19, $21, $22}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'TOTAL TIME' | awk '{print
$4, $24}'

I would like to create a tables which does this kind of job and then writes output to Parquet

Please let me know how this can be done.  Thank you!

View raw message