manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <karl.wri...@nokia.com>
Subject RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr
Date Wed, 21 Jul 2010 14:10:03 GMT
The response you are seeing from the Solr webapp is quite unhelpful.  Tomcat usually has several
logs you can dig through - stdout captures, stderr captures, etc - but they differ somewhat
based on what platform you are using.  If you can't find any exceptions there, then remember
that Solr has a configurable logging setup that may also be useful, but you'll have to refer
to the solr documentation for how to set that up & where to look.  I usually just run
Solr using the jetty example server, so I'm not going to be much help to you, I'm afraid.

Karl


From: ext Jens Bengtsson [mailto:jens.bengtsson@findwise.se]
Sent: Wednesday, July 21, 2010 10:04 AM
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

OK.

Solr connections is still saying "Connection working".

Looking at the tomcat log I can't find any Solr exceptions. Pardon my ignorance but is there
any other log then the catalina log I should be looking at?


Here's some output from the simple history:

Internal Server Error 07-21-2010 15:56:50.283 document ingest (solr) http://giantbomb.com/transformers-war-for-cybertron/61-29405/...reviews/
500 133674 327
Internal Server Error 07-21-2010 15:56:50.032 document ingest (solr) http://giantbomb.com/podcast/?podcast_id=160
500 82109 380
Internal Server Error 07-21-2010 15:56:49.787 document ingest (solr) http://giantbomb.com/modnation-racers/61-26848/reviews/
500 131147 303
Internal Server Error 07-21-2010 15:56:49.583 document ingest (solr) http://giantbomb.com/sin-punishment-star-successor/61-23966/r...
eviews/ 500 99252 297
Internal Server Error 07-21-2010 15:56:49.458 document ingest (solr) http://giantbomb.com/podcast/?podcast_id=163
500 82524 280
Internal Server Error 07-21-2010 15:56:49.216 document ingest (solr) http://giantbomb.com/podcast/?podcast_id=156
500 82493 328
Internal Server Error 07-21-2010 15:56:47.772 document ingest (solr) http://giantbomb.com/ufc-undisputed-2010/61-29376/reviews/
500 132435 342 Internal Server Error

Jens

From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: den 21 juli 2010 15:55
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

What that means is that the response coming from Solr is not the expected XML either.  It
sounds like it is just plain old HTML, which is strange if you are actually talking to Solr.

When you view your Solr connection in the LCF UI, does it still say "Connection working"?

The error code of 500 you reported is also almost certainly coming from Solr, so you should
be able to get a stack trace from it that would explain the problem.  There may well be additional
Solr arguments you need to add to the connection to make everything work.  The stack trace
will tell us what the problem actually is.

An alternative might be to look at the Simple History report for one of the failed Solr indexing
attempts - that may well list the actual response back from Solr as the error text.

Karl


From: ext Jens Bengtsson [mailto:jens.bengtsson@findwise.se]
Sent: Wednesday, July 21, 2010 9:49 AM
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

Missed this in the console:

[Fatal Error] :115:120: The element type "HR" must be terminated by the matching
end-tag "</HR>".
org.apache.lcf.core.interfaces.LCFException: XML parsing error: The element type
"HR" must be terminated by the matching end-tag "</HR>".
        at org.apache.lcf.core.common.XMLDoc.init(XMLDoc.java:369)
        at org.apache.lcf.core.common.XMLDoc.<init>(XMLDoc.java:317)
        at org.apache.lcf.agents.output.solr.HttpPoster.getResponse(HttpPoster.j
ava:537)

So there's a parsing error for the XML.

From: Jens Bengtsson [mailto:jens.bengtsson@findwise.se]
Sent: den 21 juli 2010 15:36
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

So I updated and this is the error I get in the log now:

Service interruption reported for job 1279719088042 connection 'Giantbomb RSS': Error 500
from ingestion request; ingestion will be retried again later

From: Jens Bengtsson [mailto:jens.bengtsson@findwise.se]
Sent: den 21 juli 2010 14:41
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

No worries!

I'm very thankful for your help.

Jens

From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: den 21 juli 2010 14:34
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

Yesterday should be fine.  I overlooked something and have checked in a fix.  My apologies.

Karl



From: ext Jens Bengtsson [mailto:jens.bengtsson@findwise.se]
Sent: Wednesday, July 21, 2010 8:26 AM
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

That's strange because I did the checkout from https://svn.apache.org/repos/asf/incubator/lcf/trunk
yesterday and I did a update today and rebuilt everything so things should be in sync with
trunk.

Jens

From: karl.wright@nokia.com [mailto:karl.wright@nokia.com]
Sent: den 21 juli 2010 13:55
To: connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr

Hi Jens,

The trace you gave me is out of date wrt trunk by at least a month.  Would you be willing
to synch up to the latest LCF, and see how you do with that?  If you still see a trace, I'd
be happy to analyze it and perhaps check in a patch.

Karl


From: Wright Karl (Nokia-MS/Cambridge)
Sent: Wednesday, July 21, 2010 6:51 AM
To: jens.bengtsson@findwise.se; connectors-user@incubator.apache.org
Subject: RE: java.lang.NullPointerException while trying to crawl RSS feed to Solr


The 'connection working' from rss doesn't mean much.  But the 'connection working' from solr
means that lcf could talk to solr and do a ping.



In any case, you should never see an NPE from lcf, so I am going to look into this at earliest
opportunity.  It is possible that the NPE is masking some other error, but maybe it is just
broken.



Karl



--- original message ---

From: "ext Jens Bengtsson" <jens.bengtsson@findwise.se>

Subject: java.lang.NullPointerException while trying to crawl RSS feed to Solr

Date: July 21, 2010

Time: 6:38:7  AM


Hi!

I have setup a connector against a RSS-feed with output to a Solr server. The repository connection
and output connection report that the connection is ok.

When I run the job it seems to retrieve the RSS feed and process everything as it should,
the data does not seem to get indexed into Solr however.

If I look in the lcf log file I find the following:
Error tossed: null
java.lang.NullPointerException
                             at org.apache.lcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:976)

I guess there's an error when it tries to post the data to Solr, but I can't figure out what
the problem is. If I look at the catalina log for the tomcat where Solr is run I can't find
any errors or anything else.

Does anyone have any tips?


Mime
View raw message