From dev-return-20684-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Mon Feb 24 17:28:49 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A71881802C7 for ; Mon, 24 Feb 2020 18:28:48 +0100 (CET) Received: (qmail 3967 invoked by uid 500); 24 Feb 2020 17:28:47 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 3953 invoked by uid 99); 24 Feb 2020 17:28:47 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Feb 2020 17:28:47 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0562AC012B for ; Mon, 24 Feb 2020 17:28:47 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.001 X-Spam-Level: X-Spam-Status: No, score=0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id LMXpOIYfX1kE for ; Mon, 24 Feb 2020 17:28:45 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2607:f8b0:4864:20::42c; helo=mail-pf1-x42c.google.com; envelope-from=mikerthomsen@gmail.com; receiver= Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 497F97DC1E for ; Mon, 24 Feb 2020 17:28:44 +0000 (UTC) Received: by mail-pf1-x42c.google.com with SMTP id j9so5688966pfa.8 for ; Mon, 24 Feb 2020 09:28:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=yWvOvu1TmntT5nn1R5HYeNx/NN0eV4gbrkDqkaaLyuc=; b=MXf7PuAH1gynZBS/tLjNZgGNAaeTvk754+SvF8gLYE8Xu7GaBHG3V1khm7yJR4QVVW xz8MyqBQmshWLqW7Ko0xJBAA9jZmcKLqearxhOx13PhG09M1redIWGzwJcudJ0r9eT3T Hnrd0hRNA/NWFddwXWaLPPTgztKJbmaaplQMSB+wRbcdxlvbj00zGdwkA0m7nfZovJ2A 4E79SbtrVvMt2XmO8z5I1Klt42RB3OwVlW3pNZIcXts0pkNsAWGjzF+jfivVrg5wk11g h6mbd/686Rr6ZfQk8FrU9oQs7byVWOqvTXZeaElMD+LCkdHGZ5PXwGPE7Dm9VS0F4lzX 6GhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=yWvOvu1TmntT5nn1R5HYeNx/NN0eV4gbrkDqkaaLyuc=; b=najginAOuvXzK1J81x2hfl7CyXEfoOyfWJxcnlNMNyKgRilQTgF1NrGJf9A027tozI t3SQANpqdYH008E9RvoI6AusO2dc0L3p/hF2MPwdUTurg1SOk4xbik7YsiagO0ljwzjh yeLvODaHciKUrrzYl74Yb92yNswdoy82rlJpLPbJaVNxQQoodPSlKrBU8P5RUBUx0PUv YuFuPWAJgLMj4zS7Elr6oo6NLJNv5S/WdoVC+kD5Uxp0o26rkmCGMhQGMPayf5B6T78+ GndSrvsGdeaHyElLQeoNCViAPNECP6fTWaQSl6+Yp3M3JPhnbyvs2wcDeluZmp0kB1cX E3vQ== X-Gm-Message-State: APjAAAXgO1bky9EJ54VTRIPpINyLkhfYhujzzA4fzDtz7eYLcbpmOuIz jQ5mMaoFjgXa2Opt96YYnvQnHhSBz+/T2Uw378WZVdA1 X-Google-Smtp-Source: APXvYqxKSaZr2wd+W1pPCkQ4McJ6ZHoRVrBIVrbyFs48hbKQ3KCnzeqR2G76U3OsAI6AXVBVuuJwhItd3XYHswZEF7Y= X-Received: by 2002:a63:ed14:: with SMTP id d20mr53206237pgi.267.1582565317274; Mon, 24 Feb 2020 09:28:37 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Mike Thomsen Date: Mon, 24 Feb 2020 17:28:26 +0000 Message-ID: Subject: Re: [DISCUSS] Advanced search capabilities To: dev@nifi.apache.org Content-Type: multipart/alternative; boundary="00000000000091f135059f55b348" --00000000000091f135059f55b348 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable FWIW, having spent a lot of time in the last year working with graph database ingestion, I really don't see this story ending well for replacing Lucene. On Mon, Feb 24, 2020 at 5:01 PM Otto Fowler wrote= : > +1 for the =E2=80=9Cbring your own golden hammer=E2=80=9D approach > > > > > On February 24, 2020 at 11:46:14, Mike Thomsen (mikerthomsen@gmail.com) > wrote: > > Another thing I forgot to throw out there was that you have an issue of > latency if you use Janus or Neo4j. Lucene will almost certainly have > substantially lower latency for updating and querying the provenance data > if you were to do a bake off between the two to power a provenance > repository. > > That said, if you care more about being able to query with Cypher or > Gremlin than having raw performance, you could write a custom provenance > repository. They are pluggable. > > On Sat, Feb 22, 2020 at 7:00 AM Martin Ebert wrote= : > > > Hi Mike, > > that is a fair point. You would actually raise the minimum requirements > of > > Nifi accordingly if you wanted to use a graph. As an additional > > application, as we are currently planning, Neo4j is nevertheless a good > > choice and there is nothing to be said against making it open source. T= he > > open source version of Neo4j should be sufficient for this. > > > > > > Mike Thomsen schrieb am Sa., 22. Feb. 2020, > > 02:36: > > > > > Martin, > > > > > > In theory, a graph database would be superior here. Absolutely. In > > > practice, none of the tech out there is better than the current > > > Lucene-based approach in terms of ease of development and integration > and > > > low memory footprint. Adding Neo4J or JanusGraph would cause a huge > jump > > in > > > the minimum requirements to run NiFi. Possibly to the point where Xms > and > > > Xmx would have to start at 2GB for people getting started. > > > > > > It's been a long time since I've played with Atlas and the Atlas > > > integration, but if that doesn't work you can build in support for > Cypher > > > and Gremlin by adding -Pinclude-graph to a 1.10 or 1.11 build. In 1.1= 0, > > one > > > of the NARs was overlooked in that profile, so you'd need to add it > back > > to > > > the profile. That was fixed in 1.11. The ExecuteGraphQuery processor > will > > > allow you to execute Cypher or Gremlin commands/scripts depending on > > which > > > controller service/driver you configure. > > > > > > On Fri, Feb 21, 2020 at 6:42 PM Martin Ebert > > wrote: > > > > > > > We still think about building a graph based search (Neo4j) in top o= f > > > NiFi. > > > > Would be also fantastic to have it within NiFi. > > > > > > > > There are plenty of examples > > > > > > > > > > > > > > > https://blog.grandstack.io/using-neo4js-full-text-search-with-graphql-e3f= a484de2ea > > > > From the idea it could go in this direction - of course much more > > > > rudimentary. Then one would have the possibility to have only the > > results > > > > displayed as text or to find out exploratory connections (graph > > layout). > > > > The built-in data lineage function of NiFi would also benefit from > the > > > > power of Neo4j. > > > > > > > > Simon Bence schrieb am Fr., 21. Feb. > 2020, > > > > 19:00: > > > > > > > > > Dear Community, > > > > > > > > > > In my project, I do use relatively high number of processors and > > > process > > > > > groups. The current search function on the NiFi UI has no > > > capabilitites > > > > to > > > > > narrow the results based on the group, which would make the resul= ts > > > more > > > > > relevant, so I would like to propose a possible solution. Please = if > > you > > > > > have any comment on this, do not hesitate to share it. > > > > > > > > > > The general approach would be to keep the current text box and > extend > > > the > > > > > server side capabilities to process search query in the similar > > manner > > > > for > > > > > example the Google search behaves.This extensions I would call > > > "filters". > > > > > For now I am interested in the ones I will mention below, but I > > think, > > > it > > > > > is only a matter of small work for further extend the solution wi= th > > > > further > > > > > ones. > > > > > > > > > > In order to distinguish the filters from the rest of the search > > query, > > > I > > > > > propose to put them at the beginning of the query and use the > > > > > [a-zA-Z0-9\.]{1..n}\:[a-zA-Z0-9\.]{1..n} format. For example a > filter > > > > might > > > > > look the following: lorem:ipsum > > > > > > > > > > Adding this, the search query should look like the following: > > > > > > > > > > filter1:value filter2:value rest of the query > > > > > > > > > > As for processing the filters, I suggest the following behaviour: > > > > > > > > > > - Without filters the current behaviour should be kept > > > > > - Everything after the filters should be handled as the search te= rm > > > > > - After the first "non filter word", anything should be considere= d > as > > > > part > > > > > of the search term (meaning: to keep the text parsing simple, I > would > > > not > > > > > go in the direction to support filters at the end of the query, > etc.) > > > > > - The ordering of the filters should have no effect on the result > > > > > - Filter duplications should be eliminated > > > > > - In case a filter appears multiple times in the query, the first > > > > occasion > > > > > will be used > > > > > - Unknown filters should be ignored > > > > > - Only adding filters will not end up with result, at least one > > > character > > > > > must appear as search term > > > > > > > > > > Suggested filters: > > > > > > > > > > scope > > > > > Narrows the search based on the user's currently active process > > group. > > > > The > > > > > allowed values are: "all" and "here". All produces the current > > > behaviour, > > > > > thus no filtering happens, but "here" should use the current > process > > > > group > > > > > as "root" of the search, ignoring everything else (including pare= nt > > > > group). > > > > > Note: This needs a minimal frontend change, because as I did see, > > > > currently > > > > > the current group is not sent with the search query. > > > > > > > > > > group > > > > > Narrows the search for a given processing group, if it exists. Th= e > > > > > behaviour is recursive, thus the result will include the containe= d > > > groups > > > > > as well. If it is a non-existing group, the result list should be > > > empty. > > > > > > > > > > properties > > > > > Controls if properties values are included or not. If not provide= d, > > the > > > > > property values will be included. This is because in a lot of cas= es > > > there > > > > > is a huge number of results come from property names. > > > > > > > > > > - Valid values for inclusion: yes, true, include, 1 > > > > > - Valid values for exclusion: no, none, false, exclude, 0 > > > > > > > > > > It is possible that the range of possible values should be limite= d > > (and > > > > not > > > > > being ambiguous), but I see a merit of "permissiveness" here as i= t > is > > > > > simpler to remember. > > > > > > > > > > Also some example: > > > > > > > > > > 1. > > > > > scope:here properties:exclude lorem ipsum > > > > > This should search only in the current group (and it's children), > > > > excluding > > > > > properties and return with components containing the "lorem ipsum= " > > > > > expression. > > > > > > > > > > 2. > > > > > group:myGroup someQuery > > > > > This should result the finding of components with someQuery > > expression, > > > > but > > > > > only within the myGroup group, even if it is not the active one. > > > > > > > > > > 3. > > > > > scope:all properties:include lorem > > > > > This should behave the same as "lorem" without filters. > > > > > > > > > > Thanks for reading, I am interested to hear your opinion! > > > > > > > > > > Kind regards, > > > > > Bence > > > > > > > > > > > > > > > --00000000000091f135059f55b348--