hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7446) HDFS inotify should have the ability to determine what txid it has read up to
Date Wed, 26 Nov 2014 20:14:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226755#comment-14226755
] 

Colin Patrick McCabe commented on HDFS-7446:
--------------------------------------------

bq. It feels like we have a mismatch between the underlying data and our objects. The need
for the VHS-rewind in getTxidBatchSize is one example, what we really want there is an iterator
of EditEvents, with one EditEvents per txid (name is just a suggestion).

Basically, the code in {{getTxidBatchSize}} is converting between what we get back from the
NameNode RPC (a bunch of Events, not grouped by txid) to what we want to return (a batch of
events that all have the same txid).

I guess we could do the conversion earlier, in {{PBHelper#convert(GetEditsFromTxidResponseProto
resp)}}.  This would mean pushing the batch concept into {{EventsList}}.  I'm not sure there's
much benefit in this, though.  {{EventsList}} is used on the {{NameNode}} as well, and the
NN doesn't care about grouping Events of the same txid into batches.  But we'd have to implement
all that logic anyway if we went down this route.  (The NN has its own concept of batching...
how many Events it sends back in a single RPC... and of course it will not split a txid across
RPCs.  But that's the only batching we get out of it.)

I admit that rewinding the iterator feels awkward.  I was looking for a utility function in
{{Iterators}} or something that would help with this, but I didn't find anything.  Java doesn't
have the concept of duplicating iterators that C++ does, so you can't just make a copy of
the iterator and then push only that copy forward.  I wanted to avoid allocating an array
until I knew what size I wanted.  I suppose I could create a {{LinkedList}} rather than an
array, and avoid the rewinding that way.  That's not very efficient, though.  One slightly
awkward function feels like an acceptable tradeoff for memory efficiency.

bq. The txid could also be moved into EditEvents which would also save some bytes.

Most batches are small-- I think the median size will be a batch of size 1, with maybe some
outliers at 2, 3, or 4.  I don't think there's a lot of memory savings to be had by moving
the txid into {{EditEvents}}.

A bigger issue is that we need (well maybe not need, but the code will be really, really awkward
if we don't) to add txid to the protobuf {{EventProto}} structure in {{inotify.proto}}.  This
is because there can be "gaps" in the txids returned by {{EventsListProto}}.  Currently, all
{{EventsListProto}} has is firstTxid, lastTxid, and a list of Events.  But you don't know
what the txid of any of those Events actually is... if firstTxid is 100 and lastTxid is 200,
and you have 50 events, where are they in that sequence?  Who knows.  And if you add this
to the PB, it's extremely awkward not to add it to the Java {{Event}} class as well.

> HDFS inotify should have the ability to determine what txid it has read up to
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-7446
>                 URL: https://issues.apache.org/jira/browse/HDFS-7446
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfsclient
>    Affects Versions: 2.6.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-7446.001.patch
>
>
> HDFS inotify should have the ability to determine what txid it has read up to.  This
will allow users who want to avoid missing any events to record this txid and use it to resume
reading events at the spot they left off.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message