camel-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Romain Manni-Bucau <rmannibu...@gmail.com>
Subject Re: XML Splitting with streaming (Was: a bit of stax)
Date Mon, 23 May 2011 15:43:56 GMT
IMHO it should be merged into xpath component adding a parameter.

Howeevr the other solution you speak about is interesting too: creating a
dedicated component aroud stax which is a way to parse xml a bit particular
compared to dom or sax.

- Romain

2011/5/23 Xavier Coulon <xcoulon@gmail.com>

> Hello,
>
> I haven't posted any thing about the component below on JIRA yet, as I was
> thinking a bit more about it this week-end...
> Should it be a separate component as it is shown below (named "stax"
> because
> of the underlying technology it uses), or should it be merged with the
> actual "xpath" component ? The former solution may seem a bit confusing to
> the API users, the latter would require more work but would be cleaner.
>
> What do you think about it ?
> Regards,
> Xavier
>
> On Wed, May 18, 2011 at 5:11 PM, Xavier Coulon <xcoulon@gmail.com> wrote:
>
> > Hello,
> >
> > As a complement the contribution of Romain (who is a collegue of mine,
> but
> > in a different team), I would like to submit another component to the
> Camel
> > project. This component splits XML inputs with streaming, which,
> according
> > to the documentation, is not possible yet. The rule for splitting is an
> > XPath expression, and the input source can be a GenericFile or an
> > inputstream.
> >
> > The code is based on 3 classes, so I put it directly in this message (I
> > just excluded the JUnit tests here):
> >
> > public class StaxExpressionBuilder implements Expression {
> >
> > private static final Logger LOGGER = LoggerFactory
> >  .getLogger(StaxExpressionBuilder.class);
> >
> >  /** The XPath value that inputstream elements must match to be splitted.
> > */
> >  private final String path;
> >
> >  public StaxExpressionBuilder(String path) {
> > this.path = path;
> >  }
> >
> >  @SuppressWarnings("unchecked")
> > @Override
> >  public <T> T evaluate(Exchange exchange, Class<T> type) {
> >  try {
> > Endpoint fromEndpoint = exchange.getFromEndpoint();
> >  fromEndpoint.getEndpointKey();
> > Object body = exchange.getIn().getBody();
> >  InputStream inputStream = null;
> > if (body instanceof GenericFile) {
> >  GenericFile<File> file = (GenericFile<File>) body;
> > inputStream = new FileInputStream(file.getFile());
> >  }
> > if (inputStream != null) {
> >  return (T) new StaxIterator(inputStream, path);
> > }
> >  LOGGER.error("No inputstream for message body of type "
> >  + body.getClass().getCanonicalName());
> > } catch (FileNotFoundException e) {
> >  LOGGER.error("Failed to read incoming file", e);
> > } catch (XMLStreamException e) {
> >  LOGGER.error(
> > "Failed to create STaX iterator on incoming file message",
> >  e);
> > }
> >  return null;
> > }
> > }
> >
> > --------------------------------
> > public class StaxIterator implements Iterator<String> {
> >
> >  private final AtomicInteger counter = new AtomicInteger(0);
> >  private static final Logger LOGGER = LoggerFactory
> > .getLogger(StaxIterator.class);
> >
> > private final XMLEventReader eventReader;
> >  private final XPathLocation currentLocation = new XPathLocation();
> >  private final List<String> matchPathes;
> > private final XMLInputFactory inputFactory =
> XMLInputFactory.newInstance();
> >
> > private String nextItem = null;
> >
> > public StaxIterator(InputStream inputStream, String pathes)
> >  throws XMLStreamException {
> > this.matchPathes = new ArrayList<String>();
> >  for (String path : pathes.split("\\|")) {
> > this.matchPathes.add(path.trim());
> >  }
> > this.eventReader = inputFactory.createXMLEventReader(inputStream);
> >  this.nextItem = readNextItem();
> > }
> >
> > @Override
> >  public boolean hasNext() {
> > return (nextItem != null);
> >  }
> >
> >  @Override
> > public String next() {
> >  String currentItem = this.nextItem;
> > this.nextItem = readNextItem();
> >  return currentItem;
> > }
> >
> > private String readNextItem() {
> >  try {
> > StringBuilder itemBuilder = null;
> >  boolean found = false;
> > String item = null;
> >  while (eventReader.hasNext() && !found) {
> > XMLEvent event = eventReader.nextEvent();
> >  if (event.isStartElement()) {
> > StartElement element = event.asStartElement();
> >  String localName = element.getName().getLocalPart();
> > currentLocation.appendSegment(localName);
> >  if (currentLocation.matches(matchPathes)) {
> > itemBuilder = new StringBuilder();
> >  }
> > startRecording(itemBuilder, element);
> >  } else if (event.isCharacters()) {
> > record(itemBuilder, event.asCharacters());
> >  } else if (event.isEndElement()) {
> > // If we reach the end of an item element we stop recording.
> >  endRecordingElement(itemBuilder, event.asEndElement());
> > if (currentLocation.matches(matchPathes)) {
> >  found = true;
> > item = itemBuilder.toString();
> >  counter.incrementAndGet();
> > }
> >  currentLocation.removeLastSegment();
> > }
> >  }
> > return item;
> >  } catch (XMLStreamException e) {
> > LOGGER.error("Failed to read item #" + counter.get()
> >  + " from inputstream", e);
> > return null;
> >  }
> > }
> >
> > private void endRecordingElement(StringBuilder itemBuilder,
> >  EndElement endElement) {
> > if (itemBuilder == null) {
> >  return;
> > }
> >  itemBuilder.append("</").append(endElement.getName().getLocalPart())
> >  .append(">");
> > }
> >
> > private void record(StringBuilder itemBuilder, Characters characters) {
> >  if (itemBuilder == null) {
> > return;
> >  }
> > itemBuilder.append(characters.getData());
> >  }
> >
> >  private void startRecording(StringBuilder itemBuilder, StartElement
> > element) {
> >  if (itemBuilder == null) {
> > return;
> >  }
> > itemBuilder.append("<").append(element.getName().getLocalPart());
> >  @SuppressWarnings("unchecked")
> > Iterator<Attribute> attributes = element.getAttributes();
> >  while (attributes.hasNext()) {
> > Attribute attr = attributes.next();
> >  itemBuilder.append(" ").append(attr.getName()).append("=\"")
> >  .append(attr.getValue()).append("\"");
> > }
> >  itemBuilder.append(">");
> > }
> >
> > @Override
> >  public void remove() {
> > throw new UnsupportedOperationException(
> >  "remove() method is not supported by this Iterator, in the context of
> > StAX input reading only.");
> >  }
> > }
> >
> > --------------------------------
> > public class XPathLocation {
> >
> > private static final String NODE_SEPARATOR = "/";
> >
> > private static final String DOUBLE_NODE_SEPARATOR = "//";
> >
> > /** location with initial value. */
> >  private String location = NODE_SEPARATOR;
> >
> >  /**
> >  * Constructor
> >  */
> > public XPathLocation() {
> >  super();
> > }
> >
> > /**
> >  * Full Constructor.
> >  *
> >  * @param value
> >  *            initial value
> >  */
> > public XPathLocation(String value) {
> >  super();
> > this.location = value;
> >  }
> >
> >  public String getLocation() {
> > return location;
> >  }
> >
> >  public String appendSegment(String segment) {
> > location = new StringBuilder(location).append(NODE_SEPARATOR)
> >  .append(segment).toString();
> > location = location.replaceAll("//", "/");
> >  return location;
> > }
> >
> > public String removeLastSegment() {
> >  location = StringUtils.substringBeforeLast(location, NODE_SEPARATOR);
> >  if (location.isEmpty()) {
> > location = NODE_SEPARATOR;
> >  }
> > return location;
> >  }
> >
> >  /**
> >  * Returns true if one of the given pattern matches the current location,
> >  * false otherwise
> >  *
> >  * @param orPatterns
> >  *            the given patterns
> >  * @return true or false
> >  */
> >  public boolean matches(final List<String> orPatterns) {
> > for (String pattern : orPatterns) {
> >  if (matches(pattern)) {
> > return true;
> >  }
> > }
> >  return false;
> > }
> >
> > /**
> >  * Returns true if the given pattern matches the current location, false
> >  * otherwise
> >  *
> >  * @param pattern
> >  *            the given pattern
> >  * @return true or false
> >  */
> >  public boolean matches(final String pattern) {
> > if (pattern == null || pattern.isEmpty()) {
> >  return false;
> > } else if (pattern.startsWith(NODE_SEPARATOR)) {
> >  return matchStartWith(pattern);
> > } else if (pattern.contains(DOUBLE_NODE_SEPARATOR)) {
> >  return matchContains(pattern);
> > } else {
> >  String lastSegments = StringUtils.substringAfterLast(location,
> > pattern + NODE_SEPARATOR);
> >  return (!lastSegments.isEmpty()) && location.endsWith(lastSegments)
> >  && !lastSegments.contains(NODE_SEPARATOR);
> > }
> >  }
> >
> >  private boolean matchContains(String pattern) {
> > String firstSegments = StringUtils.substringBefore(pattern,
> >  DOUBLE_NODE_SEPARATOR) + NODE_SEPARATOR;
> > String lastSegments = NODE_SEPARATOR
> >  + StringUtils.substringAfter(pattern, DOUBLE_NODE_SEPARATOR);
> >
> >  return location.contains(firstSegments)
> > && location.endsWith(lastSegments)
> >  && location.indexOf(lastSegments, firstSegments.length()) >= (location
> >  .indexOf(firstSegments) + firstSegments.length() - NODE_SEPARATOR
> >  .length());
> > }
> >
> > private boolean matchStartWith(String pattern) {
> >  if (pattern.startsWith(DOUBLE_NODE_SEPARATOR)) {
> > return location.endsWith(StringUtils.substringAfter(pattern,
> >  DOUBLE_NODE_SEPARATOR));
> > } else {
> >  return pattern.equals(location);
> > }
> >  }
> > }
> > --------------------------------
> >
> > In the code, here is how he use it:
> >
> > public class MyRouteBuilder
> >  extends RouteBuilder {
> >
> >         @Override
> > public void configure() {
> >
>  from(file:..).*split(stax("//foo/bar")).streaming()*.to(...);
> >         }
> >
> >         private Expression stax(String path) {
> >  return new StaxExpressionBuilder(path);
> > }
> > }
> >
> > Here's how it works :
> > - when splitting the incoming message body, the stax() method returns a
> new
> > type of Iterator.
> > - when streaming, the iterator's next() method is called. Using StAX
> > inside, it moves into the inputstream and keeps track of the element
> > locations it traverses.
> > - when an element's location matches the given XPathLocation, the
> iterator
> > 'records' the inputstream content and returns it at the end of the
> element.
> >
> > Note that the stax() method is part of my RouteBuilder, but it could be
> > moved to the RouteBuilder super class for a generic usage.
> >
> >
> > What do you think about it ?
> > Is this something you're interested in ?
> >
> > Best regards,
> > Xavier
> >
> > On Fri, May 13, 2011 at 8:21 AM, Romain Manni-Bucau <
> rmannibucau@gmail.com
> > > wrote:
> >
> >> Hi,
> >>
> >> thank you Richard and Claus for your feedbacks.
> >>
> >> I modified the classloading stuff, the NPE catch and added the XMLUtil
> >> class
> >> to get the tag name.
> >>
> >> I added support for input stream as input (adding some converters) but
> the
> >> problem is that camel already have a lot of converters and you can load
> >> back
> >> the whole file very fast if you don't take care.
> >>
> >> - Romain
> >>
> >> 2011/5/13 Claus Ibsen <claus.ibsen@gmail.com>
> >>
> >> > Hi
> >> >
> >> > Yeah it does look very cool. Good work.
> >> >
> >> > Would be great if the StaxComponent could also cater for non file
> >> > based inputs. You may have the message body as a Source already. But
> >> > that can always be improved.
> >> >
> >> > And yes as Richard mention the class loading should use the
> >> > ClassResolver. You can get it from the CamelContext. exchange -> camel
> >> > context -> class resolver.
> >> >
> >> > And the stuff that finds the annotations. We may have some common code
> >> > for that. Or later refactor that into a util class.
> >> >
> >> > Anyway keep it up.
> >> >
> >> >
> >> > On Fri, May 13, 2011 at 1:29 AM, Richard Kettelerij
> >> > <richardkettelerij@gmail.com> wrote:
> >> > > Hi Romain,
> >> > >
> >> > > Nice work. I've taken a look at your component. A few minor
> >> suggestions
> >> > for
> >> > > improvement, in case you want to contribute it to Apache:
> >> > >
> >> > > - The component currently uses getContextClassLoader().loadClass()
> for
> >> > > classloading. Camel actually has a abstraction to make this portable
> >> > across
> >> > > various runtime environments. You can just replace it with
> >> > > org.apache.camel.spi.ClassResolver().resolveClass().
> >> > >
> >> > > - Avoid catching the NullPointException in the
> >> > StAXJAXBIteratorExpression.
> >> > >
> >> > > - Do you plan to add a DSL method for the StAXJAXBIteratorExpression
> >> > > (requires patching camel-core)? So you can write for example
> >> > > "split(stax(Record.class))" in your route.
> >> > >
> >> > > Regards,
> >> > > Richard
> >> > >
> >> > > On Thu, May 12, 2011 at 5:55 PM, Romain Manni-Bucau
> >> > > <rmannibucau@gmail.com>wrote:
> >> > >
> >> > >> Hi all,
> >> > >>
> >> > >> i worked a bit around stax (thanks to claus for its advices).
> >> > >>
> >> > >> You can find what i've done here:
> >> > >>
> http://code.google.com/p/rmannibucau/source/browse/camel/camel-stax/
> >> > >>
> >> > >> The test show what can be done with it:
> >> > >>
> >> > >>
> >> >
> >>
> http://code.google.com/p/rmannibucau/source/browse/camel/camel-stax/src/test/java/org/apache/camel/stax/test/StAXRouteTest.java
> >> > >>
> >> > >>   - validation using sax (just need a converter)
> >> > >>   - parsing using a sax contenthandler and a stax stream reader
(a
> >> > simple
> >> > >>   component)
> >> > >>   - parsing of sub tree to get jaxb objects using a stax event
> reader
> >> > for
> >> > >>   the whole tree and jaxb for the sub objects
> >> > >>
> >> > >>
> >> > >> - Romain
> >> > >>
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Claus Ibsen
> >> > -----------------
> >> > FuseSource
> >> > Email: cibsen@fusesource.com
> >> > Web: http://fusesource.com
> >> > CamelOne 2011: http://fusesource.com/camelone2011/
> >> > Twitter: davsclaus
> >> > Blog: http://davsclaus.blogspot.com/
> >> > Author of Camel in Action: http://www.manning.com/ibsen/
> >> >
> >>
> >
> >
> >
> > --
> > Xavier
> >
>
>
>
> --
> Xavier
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message