hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marc Sturm" <mas9...@nyp.org>
Subject RE: mapreduce line separator question
Date Mon, 09 Apr 2012 21:02:47 GMT
Yes, we are now trying 1.x. And having this option will be great. I have never file a JIRA
in ASF, but will do it.
Thanks,
Marc

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: Monday, April 09, 2012 3:46 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: mapreduce line separator question

Marc,

The answer depends on the Hadoop version you are running. The following requires
https://issues.apache.org/jira/browse/MAPREDUCE-2254 which is present currently in 0.23 (and
eventually 2.x) and also (last I checked) in
CDH3 if you use that:

Simply set "textinputformat.record.delimiter" in your Job's configuration to the exact character
string you need, and that will get used as a record/line delimiter in TextInputFormat. The
string can also be multi-character, and the records would be read based to that provided sequence.

Its unavailable presently in 1.x, but it appears harmless to add this in and if you can file
a JIRA with a backport I can review and commit it in for a future 1.x update.

On Tue, Apr 10, 2012 at 12:31 AM, Marc Sturm <mas9161@nyp.org> wrote:
> Hi,
>
> I am new to Mapreduce and I have a short question: is it possible for
> a MapReduce job to split the lines of a file with \n and ignore \r?
> Basically, in the use case I am looking into, the \r has to be
> included when reading a line.
>
> I am just "playing" with mapreduce with a standalone hadoop, not using
> hdfs, and I am looking into writing my own LineReader but I am afraid
> it is much more complicated than this. I can also update each line and
> replace the \r with a \t, but I rather leave the file and data as is.
>
> Any insight and/or link to the correct documentation will be appreciated.
>
> Thanks,
>
> Marc
>
>
>
>
> ________________________________
> This electronic message is intended to be for the use only of the
> named recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that
> any disclosure, copying, distribution or use of the contents of this
> message is strictly prohibited. If you have received this message in
> error or are not the named recipient, please notify us immediately by
> contacting the sender at the electronic mail address noted above, and
> delete and destroy all copies of this message. Thank you.
>
> --------------------
>
> This electronic message is intended to be for the use only of the
> named recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that
> any disclosure, copying, distribution or use of the contents of this
> message is strictly prohibited.  If you have received this message in
> error or are not the named recipient, please notify us immediately by
> contacting the sender at the electronic mail address noted above, and
> delete and destroy all copies of this message.  Thank you.
>
> --------------------
>
> This electronic message is intended to be for the use only of the
> named recipient, and may contain information that is confidential or privileged.
> If you are not the intended recipient, you are hereby notified that
> any disclosure, copying, distribution or use of the contents of this
> message is strictly prohibited.  If you have received this message in
> error or are not the named recipient, please notify us immediately by
> contacting the sender at the electronic mail address noted above, and
> delete and destroy all copies of this message.  Thank you.
>
>



--
Harsh J

This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged. If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited. If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message. Thank you.


--------------------

This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.




--------------------

This electronic message is intended to be for the use only of the named recipient, and may
contain information that is confidential or privileged.  If you are not the intended recipient,
you are hereby notified that any disclosure, copying, distribution or use of the contents
of this message is strictly prohibited.  If you have received this message in error or are
not the named recipient, please notify us immediately by contacting the sender at the electronic
mail address noted above, and delete and destroy all copies of this message.  Thank you.




Mime
View raw message