hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Clampffer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-8746) Reduce the latency of streaming reads by re-using DN connections
Date Fri, 26 Oct 2018 12:41:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-8746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

James Clampffer updated HDFS-8746:
    Parent Issue: HDFS-14032  (was: HDFS-8707)

> Reduce the latency of streaming reads by re-using DN connections
> ----------------------------------------------------------------
>                 Key: HDFS-8746
>                 URL: https://issues.apache.org/jira/browse/HDFS-8746
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Bob Hansen
>            Assignee: James Clampffer
>            Priority: Major
> The current libhdfspp implementation opens a new connection for each pread.  For streaming
reads (especially streaming short-buffer reads coming from the C API, and especially once
we get SSL handshake overhead), our throughput will be dominated by the connection latency
of reconnecting to the DataNodes.
> The target use case is a multi-block file that is being sequentially streamed and processed
by the client application, which consumes the data as it comes from the DN and throws it away.
 The data is read into moderately small buffers (~64k - ~1MB) owned by the consumer, and overall
throughput is the critical metric.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message