camel-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aki Yoshida (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CAMEL-7468) Make xmlTokenizer more xml-aware so that it can handle more flexible structures
Date Tue, 27 May 2014 16:46:03 GMT

    [ https://issues.apache.org/jira/browse/CAMEL-7468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14009880#comment-14009880
] 

Aki Yoshida commented on CAMEL-7468:
------------------------------------

I added a new version that uses the stax parser to search for the target token and extract
the token from its underling buffer directly.

As XML tokenizing is inherently different from the non-xml tokenizing. I created its own language
and expression for this new xml tokenizer.

I noticed there is a difference in the behavior of XMLStreamReader.getLocation() between woodstox
(com.ctc.wstx.sr.ValidatingStreamReader) and JDK (com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl).
Namely, woodstox returns the location at the beginning of the token whereas JDK returns the
location at the end of the token. For example, when at START_ELEMENT, woodstox returns the
position of "<" of that start tag, whereas JDK returns the position of ">" of that tag.

I need to get this behavior clarified and I'll probably need to add an auto-detect mechanism.


> Make xmlTokenizer more xml-aware so that it can handle more flexible structures
> -------------------------------------------------------------------------------
>
>                 Key: CAMEL-7468
>                 URL: https://issues.apache.org/jira/browse/CAMEL-7468
>             Project: Camel
>          Issue Type: Improvement
>          Components: camel-core
>            Reporter: Aki Yoshida
>            Assignee: Aki Yoshida
>             Fix For: 2.14.0
>
>
> The existing xmlTokenizer can tokenize an XML document using the specified element tag
name and produce a series of tokens that are either the child tokens with the injected namespace
declarations from its parent node or the tokens wrapped in their ancestor elements.
> That implementation has several limitations:
> - a specific namespace cannot be specified.
> - a specific hierarchy cannot be specified.
> - the wrap mode assumes each token to have the same ancestor path.
> This patch will remove these limitations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message