nifi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Moser <moser...@gmail.com>
Subject Re: Purpose of Disallowing Attribute Expression
Date Thu, 12 May 2016 21:46:58 GMT
Hi,

NIFI-1077 [1] has discussed this a bit in the past, when
ConvertCharacterSet was improved to support expression language.  A JIRA
ticket is needed to spur action on these requests.

An interesting case to help this would be to improve the IdentifyMimeType
processor to detect character encodings on text data.  Apache Tika can do
it with an EncodingDetector [2], so why not take advantage since it's
already part of IdentifyMimeType?  I think this would be cool so I wrote
NIFI-1874 [3].

-- MIke

[1] - https://issues.apache.org/jira/browse/NIFI-1077
[2] -
https://tika.apache.org/1.12/api/org/apache/tika/detect/EncodingDetector.html
[3] - https://issues.apache.org/jira/browse/NIFI-1874



On Thu, May 12, 2016 at 3:52 PM, dale.chang13 <dale.chang13@outlook.com>
wrote:

> Joe Witt wrote
> > It is generally quite easy to enable for Property Descriptors which
> > accept user supplied strings.  And this is one that does seem like a
> > candidate.  Were you wanting it to look at a flowfile attribute to be
> > the way of indicating the character set?
> >
> > Thinking through this example the challenges that come to mind are:
> > - What to do if the flow file doesn't have the charset indicated as an
> > attribute?
> > - What to do if the charset indicated by the flowfile attribute isn't
> > supported?
> >
> > There are various cases to consider is all and your idea is a good one
> > to pursue in my view.  We had wanted to make it be an enumerated value
> > at one point so users could only selected from known/valid charsets.
> > But your idea is good too.
>
> Yes, setting the character set or other properties as a flowfile attribute
> would be helpful. I have already tweaked Extract Text in order to support
> expression language as well as providing UTF-8 as the default character set
> and remove its mandatory specification
>
> I suppose the ExtractText processor could route to an "invalid character
> set" relationship if there is a conflict. That would require a character
> set
> detection service at the least though.
>
> I only asked because our limitation was to use as much out-of-the-box
> functionality and as little custom processors as possible for maintenance's
> sake.
>
> Would it be possible to implement this change (more properties supporting
> expression language) in future releases? I know it would warrant an
> in-depth
> discussion on the goals that NiFi would like to achieve
>
>
>
> --
> View this message in context:
> http://apache-nifi-developer-list.39713.n7.nabble.com/Purpose-of-Disallowing-Attribute-Expression-tp10221p10227.html
> Sent from the Apache NiFi Developer List mailing list archive at
> Nabble.com.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message