hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elango, Vikram" <Vikram_Ela...@SYNTELINC.COM>
Subject RE: Force number of records per map task
Date Fri, 31 Aug 2012 12:52:06 GMT
Thanks buddy !!

 

Thanks and regards,
Vikram Elango
The Home Depot, 
Nortel no: 0441-3806 

Mobile: +91-8939662345

 

From: John Omernik [mailto:john@omernik.com] 
Sent: Friday, August 31, 2012 5:44 PM
To: user@hive.apache.org
Subject: Force number of records per map task

 

This is going to sound very odd, but I am hoping to use a transform
script in such a way that I pass a filepath to the transform script, to
which it reads the file and produces a bunch of rows in hive.  In this
case the data is pcaps.  I have a location accessible to all nodes, and
I want to have my transform script read in a file location, and then
spit out, for example the IP addresses that were seen in the packet
capture (using a script I've already written).   Can I do something
whereby I load my file locations into a table in hive (one file per row)
and read that table into a transform script and only have one map task
per source row?  I don't want my script to parse several files, it may
make for some poor parrelelization, but I am having trouble forcing such
a small record count per map task. 

 

Thoughts? 

 

 


Confidential: This electronic message and all contents contain information from Syntel, Inc.
which may be privileged, confidential or otherwise protected from disclosure. The information
is intended to be for the addressee only. If you are not the addressee, any disclosure, copy,
distribution or use of the contents of this message is prohibited. If you have received this
electronic message in error, please notify the sender immediately and destroy the original
message and all copies.

Mime
View raw message