apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isha Arkatkar <icarkat...@gmail.com>
Subject Re: XmlParser operator in Malhar 3.2 and 3.3
Date Tue, 10 May 2016 00:25:57 GMT
Hi Ram,


   Were you able to resolve this issue? I debugged this problem a little
bit to find root cause of the issue. Turned out there is indeed a problem
in XMLParser operator. It was because clazz variable is marked transient in
the Parser super class. So, the value was null in setup call as transient
clazz variable value got dropped in serialization and de-serialization.
This did not get caught in unit tests, since those didn't test the
application serialization part. To fix issue, the transient should be
removed for clazz field in Parser class.

  Moreover, I think you are right about DocumentBuilder class, though I am
afraid I cannot remember the reason for adding parsedOutput as output port.
Could you use output port 'out' in super class as before?

 To give a little background on the moving this operator from
malhar-contrib to library. Originally,  while adding xsd schema validation,
I had changed dependency from XStream to JAXB. Since, there were no
additional dependencies needed for XML parser anymore, I moved the class to
malhar-library. This introduced some of the issues you saw.
In retrospect, I was wondering if it makes sense to revert this class to
 3.2 if Xstream usage was more straight-forward.

Thanks,
Isha

On Mon, May 9, 2016 at 10:05 AM, Munagala Ramanath <ram@datatorrent.com>
wrote:

> Looks like *XmlParser* operator in 3.3 is broken in a couple of ways:
>
> 1. It uses *DocumentBuilder* and related classes but supplies the XML input
> string to* DocumentBuilder.parse()*. But that method takes a File,
> InputSource or URI, _not_ an XML string:
>
> https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilder.html
> 2. It overrides *setup()* and within it invokes
> *JAXBContext.newInstance(getClazz());* which fails if the* clazz *field is
> null; this was not the case with the 3.2 version -- still trying to figure
> out why *clazz* is null even after I explicitly set it to a non-null value
> in *populateDAG()*.
>
> I'll create a JIRA and add more details there.
>
> Ram
>
> On Sun, May 8, 2016 at 7:02 PM, Munagala Ramanath <ram@datatorrent.com>
> wrote:
>
> > Hi,
> >
> > I wrote a small app to exercise the XmlParser operator. The app works
> fine
> > with Malhar 3.2
> > but fails with 3.3 with an exception like this:
> >
> > java.lang.IllegalArgumentException
> > at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637)
> > at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584)
> > at com.datatorrent.lib.parser.XmlParser.setup(XmlParser.java:135)
> > at com.datatorrent.lib.parser.XmlParser.setup(XmlParser.java:63)
> > at com.datatorrent.stram.engine.Node.setup(Node.java:161)
> > at
> >
> com.datatorrent.stram.engine.StreamingContainer.setupNode(StreamingContainer.java:1287)
> > at
> >
> com.datatorrent.stram.engine.StreamingContainer.access$100(StreamingContainer.java:92)
> > at
> >
> com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1361)
> >
> > The operator has moved to the library module in 3.3 from contrib and
> there
> > are other changes as
> > well, so I made the minor changes needed to accomodate the move but to no
> > avail. I tried
> > both 3.2.0 and 3.3.0 of apex-core, tried adding JAXB annotations to the
> > Employee class
> > but nothing seems to make any difference -- I get the same exception.
> >
> > My app for 3.3 (slightly different for 3.2) looks like this:
> > -------------------------------------
> >   *public void populateDAG(DAG dag, Configuration conf)*
> > *  {*
> > *    Gen gen = dag.addOperator("generator", new Gen());*
> >
> > *    // configure parser*
> > *    XmlParser parser = dag.addOperator("parser", new XmlParser());*
> > *    parser.setClazz(Employee.class);*
> >
> > *    ConsoleOutputOperator cons = dag.addOperator("console", new
> > ConsoleOutputOperator());*
> >
> > *    dag.addStream("input", gen.output, parser.in
> > <http://parser.in>).setLocality(Locality.CONTAINER_LOCAL);*
> > *    dag.addStream("data", parser.parsedOutput,
> > cons.input).setLocality(Locality.CONTAINER_LOCAL);*
> > ----------------------------------------
> >
> > Both versions of the project are in branch *add-xmlparse* at:
> >   *git@github.com:amberarrow/examples.git*
> >
> > Anybody know the right way to use this operator in 3.3 ?
> >
> > Thanks.
> >
> > Ram
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message