hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1283) Eliminate internal UTF8 to String and vice versa conversions in the name-node.
Date Fri, 15 Jun 2007 23:32:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Konstantin Shvachko updated HADOOP-1283:

    Attachment: EliminateUTF8.patch

This patch does all the above except for 5. I don't want to change image and edits log format
at this point.
AFAIK UTF8 and BytesWritable serializations differ only in the type of the length field.
UTF8 uses short, while in BytesWritable it is integer.

For the name-node in-memory structures I use a subclass of BytesWritable called StringBytesWritable.
It mostly contains conversion methods from/to String.

I removed implementations of the deprecated obtainLock() and releaseLock() methods in FSNamesystem.
The methods now returns OPERATION_FAILED.
Let me know if we need to keep the implementations. Otherwise we should remove them and related
on the name-node like activeLocks.

> Eliminate internal UTF8 to String and vice versa conversions in the name-node.
> ------------------------------------------------------------------------------
>                 Key: HADOOP-1283
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1283
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: Konstantin Shvachko
>         Attachments: EliminateUTF8.patch
> We have internal conversions of those two types inside name-node code. One example:
> NameNode.complete(String src, String clientName)
> then it calls
> FSNamesystem.completeFile(new UTF8(src), new UTF8(clientName));
> which in turn finally calls
> FSDirectory.addNode(path.toString(), newNode )
> and in another place
> FSDirectory.getNode(src.toString())
> So we have several conversions of the same parameter back and forth during computation.
> We should keep the parameter type consistent within different methods.
> The question is, which type should be used: String or Text.
> From previous discussions I remember that Text is more efficient in space and time for
> data. Here we mostly deal with file names and network addresses, which are ASCII.
> Does it make sense to use Text in this case?
> UTF8 is also used as a key in two maps: pendingCreates and leases.
> This should be replaced too.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message