hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjithkumar Gampa <granji...@gmail.com>
Subject Re: context.write() Vs FSDataOutputStream.writeBytes()
Date Tue, 02 Oct 2012 00:56:09 GMT
Hello all,

Anybody looked into below topic. Please reply your views.


On Fri, Sep 28, 2012 at 1:57 PM, Ranjithkumar Gampa <granjith3@gmail.com>wrote:

> Hi,
> we are using FSDataOutputStream.writeBytes() from map/reduce to write to
> Hive table path directly instead of context.write() which is working fine
> and so far no problems with this approach.
>  we make sure the file names are distinct by appending taskAttemptId to
> them and we use speculative execution 'false' to ensure map/reducer won't
> work on same data and create inconsistency in writing data to HDFS, we went
> for this approach for below reasons, please let's know if any disadvantages
> with it.
> 1) To avoid cleanup of _SUCCESS and _LOG files created by reducer/mapper
> output which Hive may not like.
> 2) To write some records from mappers which doesn't need to participate in
> Reducer logic, so can save some sort and shuffle process. We are exploring
> on Multi Output format, but still above point need to be taken care I think.
> 3) We have some special characters in data, on which we are doing String
> manipulation using 'ISO-8859-1' encoding, using Text class in
> context.write() is not preserving these characters due to default utf-8
> encoding used by it.
> Kindly please share if my understanding is not correct and there are some
> other ways of taking care above three points, I am happy to hear and learn,
> our project uses mix of Hadoop MR and Hive.
> Thanks in advance.
> Regards,
> Ranjith

View raw message