hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhe Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-7729) Add logic to DFSOutputStream to support writing a file in striping layout
Date Tue, 10 Feb 2015 21:55:11 GMT

     [ https://issues.apache.org/jira/browse/HDFS-7729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Zhe Zhang updated HDFS-7729:
----------------------------
    Attachment: HDFS-7729-005.patch

Thanks Bo and this patch looks much better now! Client striping logic is a complex piece and
I believe we are getting closer.

Logics:
# {{stripeBlocks}} is a key data structure.
#* I like the current {{BlockingQueue}}-based implementation. It's simple and handles the
most basic scenario where streamers work with approximately the same rate.
#* There will be quite a bit of follow-on work to handle failures and slow writers.
#* We should probably bound the size of the blocking queues.
{code}
stripeBlocks[i] = new LinkedBlockingQueue<LocatedBlock>();
{code}
#* We should avoid repeating the {{addBlock}} logic. Maybe we should make {{nextBlockOutputStream}}
work for both contiguous and striped blocks. I've attached a patch to demo the thoughts; please
let me know if it looks OK. It also has some other detailed changes.
# {{blocksForUnitTest}} can be obtained via an RPC call. See example below:
{code}
          List<LocatedBlock> locatedBlocks = 
              cluster.getNameNode().getRpcServer().getBlockLocations(
              TEST_FILE, 0, TEST_FILE_LEN).getLocatedBlocks();
{code}
# The following variables are moved to {{DataStreamer}}. But they are only accessed in the
outer {{DFSOutputStream}} class. I think they should still be under {{DFSOutputStream}}, but
converted to arrays?
{code}
    private long currentSeqno = 0;
    private long lastQueuedSeqno = -1;
    private long lastAckedSeqno = -1;
    private long bytesCurBlock = 0; 
{code}
# {{writeChunk}} is another key method
#* How does the following handle crossing cell boundaries? What if {{sizeOfCellInBuffer}}
is larger than {{cellSize}}?
{code}
      addToCellBuffer(b, offset, len);
      if (sizeOfCellInBuffer[curIdx] == cellSize) {
{code}
#* Right now we need to handle both _cell full_ and _packet full_ conditions. I'm thinking
maybe we should unify cell size and packet size in this phase. We can make cell size configurable
as a follow-on task.
# {{TestDFSOutputStreamStripingLayout}}
#* It should use {{@Before}} and {{@After}} classes like other unit tests
#* I tried adding a multi-group test and it didn't work (ArrayIndexOutOfBoundsException)

Nits:
# We usually use 2 spaces to indent. It seems your IDE uses 4 spaces.
# Let's avoid bracket-less statements (see Apple's "[goto bug | http://www.wired.com/2014/02/gotofail/]")
{code}
          for(int k = 0; k < blockGroupDataBlocks; k++)
            cellBuffers[k].flip();
{code}

> Add logic to DFSOutputStream to support writing a file in striping layout 
> --------------------------------------------------------------------------
>
>                 Key: HDFS-7729
>                 URL: https://issues.apache.org/jira/browse/HDFS-7729
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Li Bo
>            Assignee: Li Bo
>         Attachments: Codec-tmp.patch, HDFS-7729-001.patch, HDFS-7729-002.patch, HDFS-7729-003.patch,
HDFS-7729-004.patch, HDFS-7729-005.patch
>
>
> If client wants to directly write a file striping layout, we need to add some logic to
DFSOutputStream.  DFSOutputStream needs multiple DataStreamers to write each cell of a stripe
to a remote datanode. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message