hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markovitz, Dudu" <dmarkov...@paypal.com>
Subject RE: RegexSerDe with Filters
Date Tue, 21 Jun 2016 11:25:48 GMT
Hi

I would suggest creating a single external table with daily partitions and multiple views
each with the appropriate filtering.
If you’ll send me log sample (~100 rows) I’ll send you an example.

Dudu

From: Arun Patel [mailto:arunp.bigdata@gmail.com]
Sent: Tuesday, June 21, 2016 1:51 AM
To: user@hive.apache.org
Subject: RegexSerDe with Filters

Hello Hive Experts,

I use flume to ingest application specific logs from Syslog to HDFS.  Currently, I grep the
HDFS directory for specific patterns (for multiple types of requests) and then create reports.
 However, generating reports for Weekly and Monthly are not salable.

I would like to create multiple external on the daily HDFS directory partitioned by date with
RegexSerde and then create separate Parquet tables for every kind of request.

Question is - How do I create multiple (about 20) RegexSerde tables on same data applying
filters?  This will be just like 20 grep commands I am running today.

Example:  hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'STORE Request Received for APPXXXX'
| awk '{print $4, $13, $14, $17, $20}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'SCAN Request Received for
APPYYYY' | awk '{print $4, $14, $19, $21, $22}'
                hadoop fs -cat /user/ffffprod/2016-06-20/* | grep 'TOTAL TIME' | awk '{print
$4, $24}'

I would like to create a tables which does this kind of job and then writes output to Parquet
tables.

Please let me know how this can be done.  Thank you!

Regards,
Arun
Mime
View raw message