hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Patel <>
Subject RegexSerDe with Filters
Date Mon, 20 Jun 2016 22:51:14 GMT
Hello Hive Experts,

I use flume to ingest application specific logs from Syslog to HDFS.
Currently, I grep the HDFS directory for specific patterns (for multiple
types of requests) and then create reports.  However, generating reports
for Weekly and Monthly are not salable.

I would like to create multiple external on the daily HDFS directory
partitioned by date with RegexSerde and then create separate Parquet tables
for every kind of request.

Question is - How do I create multiple (about 20) RegexSerde tables on same
data applying filters?  This will be just like 20 grep commands I am
running today.

Example:  hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'STORE Request
Received for APPXXXX' | awk '{print $4, $13, $14, $17, $20}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'SCAN
Request Received for APPYYYY' | awk '{print $4, $14, $19, $21, $22}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'TOTAL
TIME' | awk '{print $4, $24}'

I would like to create a tables which does this kind of job and then writes
output to Parquet tables.

Please let me know how this can be done.  Thank you!


View raw message