hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-895) Allow hflush/sync to occur in parallel with new writes to the file
Date Fri, 12 Nov 2010 18:19:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931477#action_12931477

Todd Lipcon commented on HDFS-895:

bq. I still have a question, if lastFlushOffset == bytesCurBlock, when will this condition
to be true: oldCurrentPacket != null && currentPacket != null?

I don't think that will ever be true. We do get the case {{oldCurrentPacket == null &&
currentPacket == null}} though when we call flush twice at the beginning of any block. So
I think we can add an assert {{assert oldCurrentPacket == null}} in that else clause.

bq. Please understand I did not mean to give you a hard time

No worries - I agree that this code is very tricky, which is why I'd like to keep the asserts
at this point. The assert guards what we all thought was an invariant: sequence numbers should
increase by exactly one with every packet. Nicolas also reviewed this code in depth a few
months back, which is when we added this new {{currentSeqno--}} bit. If I recall correctly
we discussed a lot whether there was any bug where we could skip or repeat a sequence number,
and when we added the assert for in-order no-skipping sequence numbers, we found this bug.

Would it be better to open a very small JIRA to add the assert and fix for it, commit that,
then commit this as an optimization? That would keep the two changes orthogonal and maybe
easier to understand?

> Allow hflush/sync to occur in parallel with new writes to the file
> ------------------------------------------------------------------
>                 Key: HDFS-895
>                 URL: https://issues.apache.org/jira/browse/HDFS-895
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>             Fix For: 0.22.0
>         Attachments: 895-delta-for-review.txt, hdfs-895-0.20-append.txt, hdfs-895-20.txt,
hdfs-895-review.txt, hdfs-895-trunk.txt, hdfs-895.txt, hdfs-895.txt, hdfs-895.txt
> In the current trunk, the HDFS client methods writeChunk() and hflush./sync are syncronized.
This means that if a hflush/sync is in progress, an applicationn cannot write data to the
HDFS client buffer. This reduces the write throughput of the transaction log in HBase. 
> The hflush/sync should allow new writes to happen to the HDFS client even when a hflush/sync
is in progress. It can record the seqno of the message for which it should receice the ack,
indicate to the DataStream thread to star flushing those messages, exit the synchronized section
 and just wai for that ack to arrive.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message