hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Wiley <kwi...@keithwiley.com>
Subject Re: CSV files as input
Date Wed, 22 Feb 2012 23:22:51 GMT
Thanks for responding.  Unfortunately, the data already exists.  I have no way of instituting
limitations on the format, much less reformatting it to suit my needs.  It is true that I
can make some general assumptions about the data (unrealistically long strings are unlikely
to occur), but I can't write a steadfastly robust reader under such assumptions.

The problem is that even if I impose an assumption of limited length strings, that doesn't
prescribe a method for handling the possibility of an error.  If a string really is too long
and the reader fails to detect it, I'm not sure how to insure that the reader or subsequent
map task fails in a clean fashion.

If I could at least impose an assumption of this sort...and then detect and fail cleanly on
violations of the assumption, that would go a long way.

I'll think about it.


On Feb 22, 2012, at 14:59 , Steve Lewis wrote:

> It sounds like you may need to give up a little to make things work - Suppose, for example,
that you placed a limit on the length of a quoted string, 
> say 1024 characters - the reader can then either start at the beginning or read back
by, say 1024 characters to see if the start is in a quote and proceed accordingly - it quoted
strings can be of arbitrary length there may be no good solution

Keith Wiley     kwiley@keithwiley.com     keithwiley.com    music.keithwiley.com

"I do not feel obliged to believe that the same God who has endowed us with
sense, reason, and intellect has intended us to forgo their use."
                                           --  Galileo Galilei

View raw message