hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8555) Random read support on HDFS files using Indexed Namenode feature
Date Tue, 08 Dec 2015 19:11:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15047286#comment-15047286

Mingliang Liu commented on HDFS-8555:

Would you kindly explain more by "... only those blocks which belong to Nicholas out of a
given large block."?
And in the _Description_ example, why seek and position read are not able to return "those
blocks that potentially have those 10 lines?"

> Random read support on HDFS files using Indexed Namenode feature
> ----------------------------------------------------------------
>                 Key: HDFS-8555
>                 URL: https://issues.apache.org/jira/browse/HDFS-8555
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client, namenode
>    Affects Versions: 2.5.2
>         Environment: Linux
>            Reporter: amit sehgal
>            Assignee: amit sehgal
>             Fix For: 3.0.0
>   Original Estimate: 720h
>  Remaining Estimate: 720h
> Currently Namenode does not provide support to do random reads. With so many tools built
on top of HDFS solving the use case of Exploratory BI and providing SQL over HDFS. The need
of hour is to reduce the number of blocks read for a Random read. 
> E.g. extracting say 10 lines worth of information out of 100GB files should be reading
only those block which can potentially have those 10 lines.
> This can be achieved by adding a tagging feature per block in name node, each block written
to HDFS will have tags associated to it stored in index.
> Namednode when access via the Indexing feature will use this index native to reduce the
no. of block returned to the client.

This message was sent by Atlassian JIRA

View raw message