hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom White (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6298) BytesWritable#getBytes is a bad name that leads to programming mistakes
Date Thu, 08 Oct 2009 15:26:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12763520#action_12763520
] 

Tom White commented on HADOOP-6298:
-----------------------------------

I don't think this proposal is about changing the API, it's about renaming the method to more
accurately describe its contract. Text.getBytes() behaves differently to String.getBytes().
It is a problem that trips up users; see, for example, http://www.nabble.com/can%27t-read-the-SequenceFile-correctly-td21866960.html.

We could deprecate getBytes() (on BinaryComparable and its subclasses BytesWritable and Text)
in 0.22 and create getPaddedBytes() as Nathan suggests, which is identical in functionality.
Then in the next release we would remove getBytes(). This change would not have any impact
on efficiency, since it is purely a rename.

Nathan, what's the use case for getNonPaddedValue()? It's possible that by exposing it, it
becomes easy to write an inefficient program since copying in maps or reduces is normally
expensive.



> BytesWritable#getBytes is a bad name that leads to programming mistakes
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-6298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6298
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 0.20.1
>            Reporter: Nathan Marz
>
> Pretty much everyone at Rapleaf who has worked with Hadoop has misused BytesWritable#getBytes
at some point, not expecting the byte array to be padded. I think we can completely alleviate
these programming mistakes by deprecating and renaming this method (again) to be more descriptive.
I propose "getPaddedBytes()" or "getPaddedValue()". It would also be helpful to have a helper
method "getNonPaddedValue()" that makes a copy into a non-padded byte array. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message