nifi-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anuj Handa <anujha...@gmail.com>
Subject Re: Replace Text
Date Sat, 18 Jun 2016 16:57:26 GMT
Hi,

i have attached the XML document and the XSLT i am using to transform into
JSON.
if i look at the encoding in the notepad ++ it says its UCS-2 Little
indian.
if i convert it the file to ANSI. it works fine.

Any help would be greatly appreciated..

Anuj

On Tue, Jun 14, 2016 at 11:12 AM, Anuj Handa <anujhanda@gmail.com> wrote:

> Hi Bryan,
>
> this is the error and when i copy the data from "Data Provenance" into a
> new file it works fine because the file is no longer UTF8
>
> org.apache.nifi.processor.exception.ProcessException: IOException thrown
> from TransformXml[id=f618472b-78dd-4c18-a582-c0a5c111383c]:
> java.io.IOException: net.sf.saxon.trans.XPathException:
> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2; The markup
> in the document preceding the root element must be well-formed.
>         at
> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2190)
> ~[na:na]
>         at
> org.apache.nifi.processors.standard.TransformXml.onTrigger(TransformXml.java:138)
> ~[nifi-standard-processors-0.6.1.jar:0.6.1]
>         at
> org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
> [nifi-api-0.6.1.jar:0.6.1]
>         at
> org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1059)
> [nifi-framework-core-0.6.1.jar:0.6.1]
>         at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136)
> [nifi-framework-core-0.6.1.jar:0.6.1]
>         at
> org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47)
> [nifi-framework-core-0.6.1.jar:0.6.1]
>         at
> org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:123)
> [nifi-framework-core-0.6.1.jar:0.6.1]
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> [na:1.8.0_77]
>         at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> [na:1.8.0_77]
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> [na:1.8.0_77]
>         at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> [na:1.8.0_77]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> [na:1.8.0_77]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_77]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> Caused by: java.io.IOException: net.sf.saxon.trans.XPathException:
> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2; The markup
> in the document preceding the root element must be well-formed.
>         at
> org.apache.nifi.processors.standard.TransformXml$1.process(TransformXml.java:161)
> ~[nifi-standard-processors-0.6.1.jar:0.6.1]
>         at
> org.apache.nifi.controller.repository.StandardProcessSession.write(StandardProcessSession.java:2172)
> ~[na:na]
>         ... 13 common frames omitted
> Caused by: net.sf.saxon.trans.XPathException:
> org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 2; The markup
> in the document preceding the root element must be well-formed.
>         at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:460)
> ~[Saxon-HE-9.6.0-5.jar:na]
>         at net.sf.saxon.event.Sender.send(Sender.java:171)
> ~[Saxon-HE-9.6.0-5.jar:na]
>         at net.sf.saxon.Controller.transform(Controller.java:1692)
> ~[Saxon-HE-9.6.0-5.jar:na]
>         at
> net.sf.saxon.s9api.XsltTransformer.transform(XsltTransformer.java:547)
> ~[Saxon-HE-9.6.0-5.jar:na]
>         at
> net.sf.saxon.jaxp.TransformerImpl.transform(TransformerImpl.java:179)
> ~[Saxon-HE-9.6.0-5.jar:na]
>         at
> org.apache.nifi.processors.standard.TransformXml$1.process(TransformXml.java:159)
> ~[nifi-standard-processors-0.6.1.jar:0.6.1]
>         ... 14 common frames omitted
> Caused by: org.xml.sax.SAXParseException: The markup in the document
> preceding the root element must be well-formed.
>         at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:883)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:118)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
> ~[na:1.8.0_77]
>         at
> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
> ~[na:1.8.0_77]
>
>
>
> On Tue, Jun 14, 2016 at 9:50 AM, Bryan Bende <bbende@gmail.com> wrote:
>
>> What is the error you are getting from TransformXML?
>>
>> On Tue, Jun 14, 2016 at 9:38 AM, Anuj Handa <anujhanda@gmail.com> wrote:
>>
>>> anybody has any thoughts on UTF 8 Flow files with XMLtransforemation and
>>> other processors ?
>>>
>>> Anuj
>>>
>>> On Mon, Jun 13, 2016 at 4:45 PM, Anuj Handa <anujhanda@gmail.com> wrote:
>>>
>>>> So it seems like its a UTF-8 issue, when i changed the string to use
>>>> Hex instead of Text and using the HEXcode with 00 (2 BYte) the contentsplit
>>>> worked.
>>>>
>>>> <POSTransaction xmlns is the string i was looking to split on which
>>>> translates into following Hex code
>>>>
>>>> *3c0050004f0053005400720061006e00730061006300740069006f006e00200078006d006c006e007300*
>>>>
>>>> the transformXML is now failing i think because of the UTF-8. I know i
>>>> had it working in normal ascii file.
>>>>
>>>> Do i need to specify someplace the flow files are UTF-8 or is it smart
>>>> enough to figure it out on its own ?
>>>> based on some reading i see that some processors expect UTF-8 so the
>>>> next question would be do all processors support UTF 8 ?
>>>>
>>>> Anuj
>>>>
>>>>
>>>>
>>>> On Mon, Jun 13, 2016 at 3:01 PM, Anuj Handa <anujhanda@gmail.com>
>>>> wrote:
>>>>
>>>>> thanks Joe, unfortunately since my xml has namespaces (xmlns )  that
>>>>> approach wont work.
>>>>> any thought on why spilt doesn't work using the tag, does it accept
>>>>> UTF8 flow files ?
>>>>>
>>>>> Anuj
>>>>>
>>>>> On Mon, Jun 13, 2016 at 2:50 PM, ski n <raymondmeester@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> You can also make your input XML well-formed by creating a custom
>>>>>> root element (e.g. <PostTransactions>...xmldocuments</PostTransactions>
>>>>>>  and then use the SplitXML processor (or just the transformation
>>>>>> step).
>>>>>>
>>>>>> 2016-06-13 18:04 GMT+02:00 Anuj Handa <anujhanda@gmail.com>:
>>>>>>
>>>>>>> i have a text file which has multiple XML documents. which starts
>>>>>>> with <POSTransaction xmlns
>>>>>>> i am trying to break each one of the XML docs into 1 flow-file
so i
>>>>>>> can then use evaluate XML and then convert into JSOn and then
load into a
>>>>>>> database.
>>>>>>>
>>>>>>> i tried just the split content and that didnt work. the file
is UTF
>>>>>>> 8 not sure if that plays into it. and i am running the nifi on
linux and
>>>>>>> the file is also local on linux.
>>>>>>>
>>>>>>> [image: Inline image 1]
>>>>>>>
>>>>>>> this is my entire workflow.
>>>>>>>
>>>>>>> [image: Inline image 2]
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jun 13, 2016 at 11:43 AM, Joe Percivall <
>>>>>>> joepercivall@yahoo.com> wrote:
>>>>>>>
>>>>>>>> Awesome, and what processor were you planning to use to split
on
>>>>>>>> "#|#|#"? The SplitContent processor[1] can be used to split
the content on
>>>>>>>> a sequence of text characters which could split on "<POSTransaction
xmlns"
>>>>>>>> without needing to add "#|#|#".
>>>>>>>>
>>>>>>>> Also I see "xmlns" and think this is an xml file you are
trying to
>>>>>>>> split. If so are you by chance trying to split evenly on
each child? If so
>>>>>>>> the "SplitXml" processor[2] would easily take care of that.
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/index.html
>>>>>>>> [2]
>>>>>>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitXml/index.html
>>>>>>>>
>>>>>>>> Joe- - - - - -
>>>>>>>> Joseph Percivall
>>>>>>>> linkedin.com/in/Percivall
>>>>>>>> e: joepercivall@yahoo.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Monday, June 13, 2016 11:26 AM, Anuj Handa <anujhanda@gmail.com>
>>>>>>>> wrote:
>>>>>>>> Yes that's exactly correct.
>>>>>>>>
>>>>>>>>
>>>>>>>> > On Jun 13, 2016, at 11:14 AM, Joe Percivall <
>>>>>>>> joepercivall@yahoo.com> wrote:
>>>>>>>> >
>>>>>>>> > Sorry I got a bit confused, in your original question
you said
>>>>>>>> that you wanted to append the value and I took it that you
just wanted to
>>>>>>>> append the value to the end of the line or text.
>>>>>>>> >
>>>>>>>> > Let me try and restate your goal so I'm sure I understand,
>>>>>>>> ultimately you want to split the incoming FlowFile on each
occurrence of
>>>>>>>> "<POSTransaction xmlns" and you are planning on using
ReplaceText to add
>>>>>>>> "#|#|#" before each occurrence so that it will be easy to
split?
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Joe
>>>>>>>> > - - - - - -
>>>>>>>> > Joseph Percivall
>>>>>>>> > linkedin.com/in/Percivall
>>>>>>>> > e: joepercivall@yahoo.com
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Monday, June 13, 2016 11:05 AM, Anuj Handa <
>>>>>>>> anujhanda@gmail.com> wrote:
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > Anuj
>>>>>>>> > Hi Joe,
>>>>>>>> >
>>>>>>>> > I modified the process per your suggestion but it only
works to
>>>>>>>> replace the first occurrence, There are multiple such tags
which it doesn't
>>>>>>>> replace. .
>>>>>>>> > when i used evaluation mode line by line it appended
it to every
>>>>>>>> line in the file and not to the one i waned too.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On Mon, Jun 13, 2016 at 10:40 AM, Joe Percivall <
>>>>>>>> joepercivall@yahoo.com> wrote:
>>>>>>>> >
>>>>>>>> > Hello,
>>>>>>>> >>
>>>>>>>> >> In order to use ReplaceText[1] to solely append
a value to the
>>>>>>>> end of then entire text then change the "Replacement Strategy"
to "Append"
>>>>>>>> and leave "Evaluation Mode" as "Entire  Text". This will
take whatever is
>>>>>>>> the "Replacement Value" and append it as a literal(without
interpreting
>>>>>>>> back-references) to the end of the text.
>>>>>>>> >>
>>>>>>>> >> Alternatively, if you want to append to the end
of each line
>>>>>>>> then change "Evaluation Mode" to "Line-by-Line".
>>>>>>>> >>
>>>>>>>> >> [1]
>>>>>>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> Hope that helps,
>>>>>>>> >> Joe
>>>>>>>> >> - - - - - - Joseph Percivall
>>>>>>>> >> linkedin.com/in/Percivall
>>>>>>>> >> e: joepercivall@yahoo.com
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> On Monday, June 13, 2016 10:05 AM, Anuj Handa <
>>>>>>>> anujhanda@gmail.com> wrote:
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >>
>>>>>>>> >> Hi,
>>>>>>>> >>
>>>>>>>> >> I am trying to read a file and then use replaceText
to append a
>>>>>>>> string so I can spilt the line in the next step. I am nable
to make the
>>>>>>>> ReplaceText work.
>>>>>>>> >> The flowfile is going through as success without
the string
>>>>>>>> being appended or replaced
>>>>>>>> >>
>>>>>>>> >> Any thoughts what i could be doing wrong
>>>>>>>> >>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message