hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bryan Pendleton (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-550) Text constructure can throw exception
Date Tue, 19 Sep 2006 21:29:22 GMT
Text constructure can throw exception

                 Key: HADOOP-550
                 URL: http://issues.apache.org/jira/browse/HADOOP-550
             Project: Hadoop
          Issue Type: Bug
            Reporter: Bryan Pendleton

I finally got back around to moving my working code to using Text objects.

And, once again, switching to Text (from UTF8) means my jobs are failing. This time, its better
defined - constructing a Text from a string extracted from Real World data makes the Text
object constructor throw a CharacterCodingException. This may be legit - I don't actually
understand UTF well enough to understand what's wrong with the supplied string. I'm assembling
a series of strings, some of which are user-supplied, and something causes the Text constructor
to barf.

However, this is still completely unacceptable. If I need to stuff textual data someplace
- I need the container to *do* it. If user-supplied inputs can't be stored as a "UTF" aware
text value, then another container needs to be brought into existence. Sure, I can use a BytesWritable,
but, as its name implies - Text should handle "text". If Text is supposed to == "StringWritable",
then, well, it doesn't, yet.

I admit to being a few weeks' back in the bleeding edge at this point, so maybe my particluar
Text bug has been fixed, though the only fixes to Text I see are adopting it into more of
the internals of Hadoop. This argument goes double in that case - if we're using Text objects
internally, it should really be a totally solid object - construct one from a String, get
one back, but _never_  throw a content-related Exception. Or, if Text is not the right object
because its data-sensitive, then I argue we shouldn't use it in any case where data might
kill it - internal, or anywhere else (by default).

Please, don't remove UTF8, for now.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message