nifi-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Frederik Petersen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NIFI-6322) Evaluator Objects are rebuilt on every call even when a CompiledExpression is used
Date Thu, 06 Jun 2019 12:58:00 GMT

    [ https://issues.apache.org/jira/browse/NIFI-6322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857624#comment-16857624
] 

Frederik Petersen commented on NIFI-6322:
-----------------------------------------

After some issues with [https://github.com/apache/nifi/pull/3500] I decided to close it
and work on a different solution in  [https://github.com/apache/nifi/pull/3518]. More details
can be found in those PRs. 
The PR also contains important test cases.

Looking forward to a review.

> Evaluator Objects are rebuilt on every call even when a CompiledExpression is used
> ----------------------------------------------------------------------------------
>
>                 Key: NIFI-6322
>                 URL: https://issues.apache.org/jira/browse/NIFI-6322
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.9.2
>            Reporter: Frederik Petersen
>            Priority: Major
>              Labels: expression-language, performance
>         Attachments: Selection_094.png, image.png
>
>          Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> Hi, 
> While doing some CPU sampling in our production environment, we encountered some strange
results. It seems like that, during the evaluation of NiFi expressions, the modification of
a _HashSet_ is the most expensive operation in this process.
> !Selection_094.png!
> This feels pretty unrealistic considering all the other processing related to evaluating
NiFi expressions. 
>  After reviewing some code and some profiling it just looks like this _HashSet_ modification
is performed way more often than required. Especially that it is done at each evaluation.
> !image.png!
>  This profiling output was produced with the following unit test:
> {code:java}
> @Test
> public void testSimple() {
>  final TestRunner runner = TestRunners.newTestRunner(new RouteOnAttribute());
>  runner.setProperty(RouteOnAttribute.ROUTE_STRATEGY, RouteOnAttribute.ROUTE_ANY_MATCHES.getValue());
>  runner.setProperty("filter", "${literal('b'):equals(${a})}");
>  for (int i = 0; i < 500; i++) {
>  runner.enqueue(new byte[0], new HashMap<String, String>() {{
>  put("a", "b");
>  }});
>  }
>  runner.run(500);
> }{code}
> The key question is: Why are the _Evaluator_ Objects (and all the stuff related to it)
built twice:
>  - Once in _ExpressionCompiler.compile()_
>  - Once again in _CompiledExpression.evaluate()_
> In other words: Every call to _CompiledExpression.evaluate()_ leads to a new _ExpressionCompiler_ being
created and expensive calls being made. Why not just reuse _Evaluator_ objects created beforehand
that are stored in the _CompiledExpression_?
> Is there a specific design decision behind that? It looks like there is room for performance
improvement, especially for heavily used processors.
> On our live system, where we perform expensive tasks like language detection, mail parsing
and such, this situation causes the most amount of CPU eaten by the expression language evaluation.
> Thank you very much for looking into this.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message