hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elek, Marton (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDDS-1084) Ozone Recon Service
Date Wed, 27 Feb 2019 11:30:00 GMT

    [ https://issues.apache.org/jira/browse/HDDS-1084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16779157#comment-16779157

Elek, Marton commented on HDDS-1084:

Thank you very much [~swagle] to upload the design docs. It looks very promising. I think
it will be a very important (if not the most important) and useful part of the Ozone stack.

I have a few comments (not proposals just brainstorming ideas):

1. I have big fan of the [copyset|https://web.stanford.edu/~skatti/pubs/usenix13-copysets.pdf]
and [tiered replication|https://www.usenix.org/system/files/conference/atc15/atc15-paper-cidon.pdf].
On basic level they can provide some information about the possibility of the data losses
based on calculating the different datanode sets (eg. container 1 is replicated to the datanode
set d1,d2,d3, container 2 is replicated to d3,d4,d5), and the number of the containers/data

We already discussed with [~anu] and [~nandakumar131] how these findings can be used to replicate
the closed containers in a safer way. I think recon server also can do some analyses about
these questions (long-term).

Eg: "3 independent node failures will cause a dataloss with 90% probability on this cluster"

Or: "any of the 3 racks can be turned off without any data loss"

2. I saw a demo about the Ceph UI. It worked very well with embedding grafana dashboards to
the HTML ui. We already have some grafana dashboard definitions in hadoop-ozone/dist/src/main/compose/common/grafana
which displays the metrics from the prometheus.

  a.) Until a full featured Ozone Console is implemented it seems to be an easy way to display
any data from recon db.  
  b.) Later it could be easy to adopt existing dashboards in an Ozone UI. The easiest way
to provide powerful statistics is embedding a grafana (IMHO)

3. Similar to the grafana I would expect to have at least one prometheus instance together
with Ozone (in production). We have native prometheus support (we have prometheus metric endpoints
and all the hadoop metrics can be saved to prometheus). We can use it as an (optional) source
of additional data (eg. detect unreliable datanodes and propose changes). This is not required
for the existing queries in the design doc but it can be considered in the future.

But these are just ideas, nothing should be done as of now. Thanks again the work on this
great feature.

> Ozone Recon Service
> -------------------
>                 Key: HDDS-1084
>                 URL: https://issues.apache.org/jira/browse/HDDS-1084
>             Project: Hadoop Distributed Data Store
>          Issue Type: New Feature
>          Components: fsck
>    Affects Versions: 0.4.0
>            Reporter: Siddharth Wagle
>            Assignee: Siddharth Wagle
>            Priority: Major
>         Attachments: Ozone_Recon_Design_V1_Draft.pdf
> Recon Server at a high level will maintain a global view of Ozone that is not available
from SCM or OM. Things like how many volumes exist; and how many buckets exist per volume;
which volume has maximum buckets; which are buckets that have not been accessed for a year,
which are the corrupt blocks, which are blocks on data nodes which are not used; and answer
similar queries.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message