hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rakesh R (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7343) HDFS smart storage management
Date Thu, 06 Apr 2017 13:27:42 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958923#comment-15958923
] 

Rakesh R commented on HDFS-7343:
--------------------------------

Thanks [~zhouwei] for more details about the data points.
bq. Create a table to store the info and insert the table name into table access_count_table.
It looks like lot of tables will be created to capture time period details, sec_1...sec_n,
min_1...min_n, hour_1...hour_n, day_1....day_n, month_1...month_12 etc. I hope these tables
will be deleted after performing the aggregation functions. Again, it may exhaust DB by growing
the number of tables if the aggregation time is longer, right?. Just a plain thought to minimize
the number of time spec tables, how about capturing {{access_time}} as a column field and
update {{access_time}} of respective {{fid}}? I think, using the {{access_time}} attribute,
we would be able to filter out specific {{fid_access_count}} between a certain {{start_time}}
and {{end_time}}.

Table {{seconds_level}} => composite key {{acess_time}} and {{fid}} to uniquely identify
each row in the table.
||acess_time||fid||count||
|sec-2017-03-31-12-59-45|3|1|
|sec-2017-03-31-12-59-45|2|1|

Again, for faster aggregation function probably we could maintain separate {{tables per units
of time}} like below. After the aggregate function, we could delete those rows used for aggregation.

(1) seconds_level
(2) minutes_level
(3) hours_level
(4) days_level
(5) weeks_level
(6) months_level
(7) years_level

> HDFS smart storage management
> -----------------------------
>
>                 Key: HDFS-7343
>                 URL: https://issues.apache.org/jira/browse/HDFS-7343
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kai Zheng
>            Assignee: Wei Zhou
>         Attachments: access_count_tables.jpg, HDFSSmartStorageManagement-General-20170315.pdf,
HDFS-Smart-Storage-Management.pdf, HDFSSmartStorageManagement-Phase1-20170315.pdf, HDFS-Smart-Storage-Management-update.pdf,
move.jpg, tables_in_ssm.xlsx
>
>
> As discussed in HDFS-7285, it would be better to have a comprehensive and flexible storage
policy engine considering file attributes, metadata, data temperature, storage type, EC codec,
available hardware capabilities, user/application preference and etc.
> Modified the title for re-purpose.
> We'd extend this effort some bit and aim to work on a comprehensive solution to provide
smart storage management service in order for convenient, intelligent and effective utilizing
of erasure coding or replicas, HDFS cache facility, HSM offering, and all kinds of tools (balancer,
mover, disk balancer and so on) in a large cluster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message