commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simone Tripodi <simonetrip...@apache.org>
Subject Re: [digester] java.lang.NullPointerException only for a specific file
Date Tue, 29 Mar 2011 12:06:04 GMT
Hi Patrick,
I'd say: it depends! I don't know the domain you're working on, I'd
say once you import the XML into Lucene index you don't need the XML
anymore.

Do you need data have to be persisted to be reused in a second time?
So use a DB.
Do you need analyze documents just to populate the Lucene index? Avoid
the DB, you don't need yet another layer!

It is question strictly related to your architecture and not to
Digester, if I were you I'd stop codiing for a little while and back
to analyze.
HTH, just my 2 cents,
Simo

http://people.apache.org/~simonetripodi/
http://www.99soft.org/



On Tue, Mar 29, 2011 at 11:20 AM, Patrick Diviacco
<patrick.diviacco@gmail.com> wrote:
> hey Simone,
>
> I was now wondering if isn't better to import my xml doc in a database and
> working with mysql.
>
> I guess it is faster to scan a mysql database with java rather than a xml
> doc, what do you think ?
>
> I'm using Digester combined with Apache Lucene to perform queries (all
> together they are 65MBs in a xml file) against a collection (65MBs in XML
> again).
>
> thanks
>
>
>
> On 28 March 2011 17:20, Simone Tripodi <simonetripodi@apache.org> wrote:
>
>> Hi Patrick,
>> take a look at this example[1]: all you have to do is obtaining a
>> ContentHandler instance as shown, then invoking SAX events while
>> parsing the original document.
>> It's more efficient and consumes less memory
>> Simo
>>
>> [1] http://www.stylusstudio.com/xmldev/200502/post20440.html
>>
>> http://people.apache.org/~simonetripodi/
>> http://www.99soft.org/
>>
>>
>>
>> On Mon, Mar 28, 2011 at 4:56 PM, Patrick Diviacco
>> <patrick.diviacco@gmail.com> wrote:
>> > hi!
>> >
>> > What should I use instead of StringBuffer ?
>> >
>> > Any example or tutorial ?
>> >
>> > thanks
>> > Patrick
>> >
>> > On 28 March 2011 16:53, Simone Tripodi <simonetripodi@apache.org> wrote:
>> >
>> >> Hi Patrick,
>> >> nice to know you quickly fixed the issue before anybody could have
>> >> provided his help! :)
>> >>
>> >> As a side note, I would suggest you taking in consideration a
>> >> different solution for the XML generation rather the StringBuffer,
>> >> since you're parsing large dataset, streaming data while parsing
>> >> would improve the performances and reduce the consumed memory.
>> >>
>> >> Just my 2 cents, have a nice day,
>> >> Simo
>> >>
>> >> http://people.apache.org/~simonetripodi/
>> >> http://www.99soft.org/
>> >>
>> >>
>> >>
>> >> On Mon, Mar 28, 2011 at 2:28 PM, Patrick Diviacco
>> >> <patrick.diviacco@gmail.com> wrote:
>> >> > I've solved. the issue was a row in train.xml file. To solve the issue
>> >> I've
>> >> > printed the source file rows while processing. However it has been
>> >> possible
>> >> > only because the parsing takes 4 minutes.
>> >> >
>> >> > I'm wondering how to debug such issues with a much bigger text file.
>> >> >
>> >> > thanks
>> >> >
>> >> > On 28 March 2011 14:14, Patrick Diviacco <patrick.diviacco@gmail.com>
>> >> wrote:
>> >> >
>> >> >> And these are the files:
>> >> >>
>> >> >> http://dl.dropbox.com/u/72686/test.xml
>> >> >>
>> >> >> http://dl.dropbox.com/u/72686/train.xml
>> >> >>
>> >> >> thanks
>> >> >>
>> >> >>
>> >> >> On 28 March 2011 14:13, Patrick Diviacco <patrick.diviacco@gmail.com
>> >> >wrote:
>> >> >>
>> >> >>> Hi,
>> >> >>>
>> >> >>> I've a 74MB xml document and I've split it into 2 docs:52MB
and 22MB
>> >> >>> respectively.
>> >> >>>
>> >> >>> I'm parsing the file using common Digester library, and everything
>> >> works
>> >> >>> perfectly for the small file, but I  get a NullPointerExceptio
with
>> the
>> >> big
>> >> >>> one.
>> >> >>>
>> >> >>> I don't think the issue is the code because it works for the
small
>> >> file...
>> >> >>> I guess the problem is with the file itself.
>> >> >>>
>> >> >>> I've parsed the files with the same parser, so I don't think
the
>> files
>> >> >>> have issues either.
>> >> >>>
>> >> >>> In conclusion I dunno where the issue is. This is the code:
>> >> >>> http://pastie.org/1726063
>> >> >>>
>> >> >>> This is the exception
>> >> >>> SEVERE: End event threw exception
>> >> >>> java.lang.reflect.InvocationTargetException
>> >> >>> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>> >> >>> at
>> >> >>>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> >>>  at java.lang.reflect.Method.invoke(Method.java:597)
>> >> >>> at
>> >> >>>
>> >>
>> org.apache.commons.beanutils.MethodUtils.invokeMethod(MethodUtils.java:216)
>> >> >>>  at
>> org.apache.commons.digester.SetNextRule.end(SetNextRule.java:220)
>> >> >>> at org.apache.commons.digester.Rule.end(Rule.java:257)
>> >> >>>  at
>> org.apache.commons.digester.Digester.endElement(Digester.java:1345)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
>> >> >>> at org.apache.commons.digester.Digester.parse(Digester.java:1871)
>> >> >>>  at CentroidGenerator.main(CentroidGenerator.java:137)
>> >> >>> Caused by: java.lang.NullPointerException
>> >> >>> at CentroidGenerator.nextItem(CentroidGenerator.java:62)
>> >> >>>  ... 19 more
>> >> >>> Exception in thread "main" java.lang.NullPointerException
>> >> >>> at
>> >> >>>
>> >>
>> org.apache.commons.digester.Digester.createSAXException(Digester.java:3363)
>> >> >>>  at
>> >> >>>
>> >>
>> org.apache.commons.digester.Digester.createSAXException(Digester.java:3389)
>> >> >>> at
>> org.apache.commons.digester.Digester.endElement(Digester.java:1348)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
>> >> >>>  at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205)
>> >> >>> at
>> >> >>>
>> >>
>> com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522)
>> >> >>>  at org.apache.commons.digester.Digester.parse(Digester.java:1871)
>> >> >>> at CentroidGenerator.main(CentroidGenerator.java:137)
>> >> >>> Caused by: java.lang.NullPointerException
>> >> >>> at CentroidGenerator.nextItem(CentroidGenerator.java:62)
>> >> >>> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>> >> >>>  at
>> >> >>>
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> >> >>> at java.lang.reflect.Method.invoke(Method.java:597)
>> >> >>>  at
>> >> >>>
>> >>
>> org.apache.commons.beanutils.MethodUtils.invokeMethod(MethodUtils.java:216)
>> >> >>> at org.apache.commons.digester.SetNextRule.end(SetNextRule.java:220)
>> >> >>>  at org.apache.commons.digester.Rule.end(Rule.java:257)
>> >> >>> at
>> org.apache.commons.digester.Digester.endElement(Digester.java:1345)
>> >> >>>  ... 12 more
>> >> >>>
>> >> >>> thanks
>> >> >>>
>> >> >>
>> >> >>
>> >> >
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> >> For additional commands, e-mail: user-help@commons.apache.org
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message