hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Ginzburg <>
Subject FW: Small file problem and GenMRFileSink1
Date Thu, 30 Jun 2011 16:53:26 GMT

 I'm not sure weather this belongs in the hive-dev or hive-user.
 I have a folder with many small files.
 I would like to reduce the number of files the way hive merges output .
 I tried to understand from the source of org.apache.hadoop.hive.ql.optimizer.GenMRFileSink1
how to leverage the API to submit a job 
 that merges output files.
 I think I was able to identify:  
 private void createMergeJob(FileSinkOperator fsOp, GenMRProcContext ctx, String finalName)
 throws SemanticException 
 As the entry point to the logic that performs the operation, but I did not find documentation
as to how to use it
 Is there an example that simulates the use of this API call?
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message