hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ming Ma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-8940) Support for large-scale multi-tenant inotify service
Date Tue, 01 Sep 2015 03:20:46 GMT

    [ https://issues.apache.org/jira/browse/HDFS-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14724724#comment-14724724

Ming Ma commented on HDFS-8940:

Thanks [~cmccabe] [~andrew.wang] [~surendrasingh] [~ajithshetty] for the great comments! Before
I posted the design, we had similar discussion @twitter around ZK scalability and whether
we should use notification mechanism or regular polling. We don’t have the definite answer
yet as it hasn’t been tested at production scale. But let me try to answers these questions.

bq. Do we need to separate the ownership of the path filter from the set of users that can
access the path filter? 
The scenario we want to enable is to separate {{INotifyPathFilterInfo}} creation(only admins
can do that) from {{INotifyPathFilterInfo}}’s {{allowedSubscribers}} modification(owner
can do that). In that way, we can provide some degree of self service. Although it isn't clear
how useful this could be.

bq. Do INotifyPathFilterInfo objects need to have unique names?
{{INotifyPathFilterInfo}}’s {{name}} member is used to identify the filter. Let me update
the doc to make it clear.

bq. It seems like removePathFilterInfo should only take the filter name rather than the whole
filter structure
Good point.

bq. It will just cause too many problems for projects downstream that already have their own
version of Curator on their CLASSPATHs.
Good point about the dependency. The motivation comes from YARN’ experience and the fact
people have been fixing various corner issues in RM.

bq. I assume that the "inotify service outage" detection will rely on some kind of periodic
"heartbeat" activity being done by the inotifyZK process.
The heartbeat mechanism will be provided by ZK ephemeral node functionality. Let me add more
description there.

bq. Any comment on experiences scaling ZK up to this number of clients / rpc load?
For our production ZK services, we can easily handle 10k watchers because ZK can scale to
handle read load.

bq. I agree that Kafka might still need ZK for leader election and tracking the txid, but
the ZK load should be a lot lower.
Kafka might not help much for this specific low-message-volume large-scale broadcast scenario
as it still needs to create lots of ZK watchers on different znodes. Kafka’s scalability
is useful when the system generates lots of messages and partitioned consumers can process
these messages in parallel. Feel free to correct or ask around if the assessment is correct.

bq. Since a path might match multiple path filters, there could be significant write amplification.

If we allow users to create whatever filters they want, that will be problematic. This design
only allows admins to create new filters thus mitigates the issue.

bq. Seems like the path filters could just be kept in ZK since the NN doesn't use them?
Not storing filter definition in ZK simplifies the management; the system can bootstrap with
a clean ZK instance given we continue to maintain no-config-in-ZK pattern for HDFS. In addition,
storing filter definition outside the actual notification delivery channel allows us to easily
test other non­-ZK systems including the read-inotify-from-standby approach.

bq. I bet a SbNN could handle a lot of read-only INotify load. We could also make the SbNN
tail in-progress edit log segments to minimize staleness. Also find a way to allow stale INotify
reads but not other read RPCs.
Good points. In addition, the client-side RPC will need to change to create a connection to
SBN for inotify requests.

bq. Can we think about HDFS-8933 also in new inotify design?
Perhaps we can modify {{INotifyPathFilterInfo}} to include a new member variable {{Collection<EventType>
eventTypes;}} and make {{pathGlob}} and {{eventTypes}} optional (but at least one of the fields
should be specified.)

bq. would the "read-inotify-from-standby" solution scale just as well as this proposed ZK
I don’t have the numbers yet as neither solution has been implemented. For the SBN approach,
as the number of SBNs increases it will put more pressure on network as they pull all edits
from JournalNode. (IO should be ok given recent edits should be in OS page cache.). In the
ZK approach, the events written to ZK have been filtered and the volume is much smaller and
thus have little impact on network. Even though ZK isn’t good at high write volume, it scales
quite well for read and watcher scenarios by adding ZK observers.
Another aspect is that "read-inotify-from-standby" requires regular polling at RPC layer and
thus most of these polls are unnecessary when update volumes for these filters are low. In
comparison, the ZK session management should be more lightweight.

Given the above discussion, I plan to go ahead and prototype ZK based approach as well as
"read-inotify-from-standby” approach and do some perf comparison.

> Support for large-scale multi-tenant inotify service
> ----------------------------------------------------
>                 Key: HDFS-8940
>                 URL: https://issues.apache.org/jira/browse/HDFS-8940
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ming Ma
>         Attachments: Large-Scale-Multi-Tenant-Inotify-Service.pdf
> HDFS-6634 provides the core inotify functionality. We would like to extend that to provide
a large-scale service that ten of thousands of clients can subscribe to.

This message was sent by Atlassian JIRA

View raw message