nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-3356) Provide a newly refactored provenance repository
Date Tue, 14 Feb 2017 20:06:41 GMT

    [ https://issues.apache.org/jira/browse/NIFI-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15866543#comment-15866543
] 

ASF GitHub Bot commented on NIFI-3356:
--------------------------------------

Github user olegz commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/1493#discussion_r101130301
  
    --- Diff: nifi-framework-api/src/main/java/org/apache/nifi/provenance/IdentifierLookup.java
---
    @@ -0,0 +1,88 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *     http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.nifi.provenance;
    +
    +import java.util.Collections;
    +import java.util.HashMap;
    +import java.util.List;
    +import java.util.Map;
    +
    +/**
    + * Provides a mechanism for obtaining the identifiers of components, queues, etc.
    + */
    +public interface IdentifierLookup {
    +
    +    /**
    +     * @return the identifiers of components that may generate Provenance Events
    +     */
    +    List<String> getComponentIdentifiers();
    +
    +    /**
    +     * @return a list of component types that may generate Provenance Events
    +     */
    +    List<String> getComponentTypes();
    +
    +    /**
    +     *
    +     * @return the identifiers of FlowFile Queues that are in the flow
    +     */
    +    List<String> getQueueIdentifiers();
    +
    +    default Map<String, Integer> invertQueueIdentifiers() {
    +        return invertList(getQueueIdentifiers());
    +    }
    +
    +    default Map<String, Integer> invertComponentTypes() {
    +        return invertList(getComponentTypes());
    +    }
    +
    +    default Map<String, Integer> invertComponentIdentifiers() {
    +        return invertList(getComponentIdentifiers());
    +    }
    +
    +    default Map<String, Integer> invertList(final List<String> values) {
    --- End diff --
    
    Obviously a List can have duplicate entries, so different indexes may correspond to the
same  value. Just wanted to make sure that this is acceptable.


> Provide a newly refactored provenance repository
> ------------------------------------------------
>
>                 Key: NIFI-3356
>                 URL: https://issues.apache.org/jira/browse/NIFI-3356
>             Project: Apache NiFi
>          Issue Type: Task
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>             Fix For: 1.2.0
>
>
> The Persistent Provenance Repository has been redesigned a few different times over several
years. The original design for the repository was to provide storage of events and sequential
iteration over those events via a Reporting Task. After that, we added the ability to compress
the data so that it could be held longer. We then introduced the notion of indexing and searching
via Lucene. We've since made several more modifications to try to boost performance.
> At this point, however, the repository is still the bottleneck for many flows that handle
large volumes of small FlowFiles. We need a new implementation that is based around the current
goals for the repository and that can provide better throughput.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message