commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thufir <hawat.thu...@gmail.com>
Subject NNTPClient.retrieveArticleBody returns MalformedServerReplyException
Date Sat, 24 Mar 2012 10:44:55 GMT
What's the correct way to get an article body?

I'm using java.util.logging.Logger to catch 
org.apache.commons.net.MalformedServerReplyException to a log file:

     15	<record>
     16	  <date>2012-03-24T03:09:35</date>
     17	  <millis>1332583775299</millis>
     18	  <sequence>1</sequence>
     19	  <logger>gwene.LogUtils</logger>
     20	  <level>INFO</level>
     21	  <class>gwene.LogUtils</class>
     22	  <method>logArticles</method>
     23	  <thread>1</thread>
     24	  <message>Could not parse response code.
     25	Server Reply: &lt;p&gt;Alex &amp;#8220;Hurricane&amp;#8221; 
Higgins, transformer of snooker, died on July 24th, aged
...text snipped...
mercilessly, one by one.  ...&lt;/p&gt;&lt;div 
class="feedflare"&gt;</message>
     26	</record>


The server reply is *exactly* what I'm missing, the content of the 
article.  code and full output:

https://gist.github.com/2180843

I'm guessing that the HTML is throwing things off?  What does 
NNTPClient.retrieveArticleBody expect?  After all, anything can be in an 
NNTP post.

Now, what I'm really after, I suppose, is the server reply because that 
has the body of the NNTP article.  However, surely, that's not the way 
to use org.apache.commons.net.nntp.NNTPClient, only I can't find the 
correct way.  Hence this kludge to grab the MalformedServerReply instead 
of parsing it.

I suppose it's possible to log everything, and then parse the log file, 
but that seems like a very complex way of doing a simple thing.

The API documentation for NNTPClient assumes a knowledge of NNTP which, 
unfortunately, I don't have.  I've looked through the example code and 
don't see any samples where article bodies are parsed.  The closest I 
see is NNTPClient.retrieveArticleBody:

https://commons.apache.org/net/api-3.1/org/apache/commons/net/nntp/NNTPClient.html#retrieveArticleBody%28java.lang.String%29

however, that's just malformed content.  Presumably, since Pan can 
connect with gmane fine, that's not the problem.  Also, by looking in 
the Pan newsreader, NNTPClient.retrieveArticleBody results match with 
what I'm after -- namely, the body of the article.

What is the correct way to grab the article body?  I've looked through 
the API quite thoroughly.

Surely there must be an example for parsing the article body, not just 
the header.  Or, at least, using BufferedReader to get the article body 
and assign it to a String.  If so, I don't see a better method available 
through the API.



thanks,

Thufir

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


Mime
View raw message