hc-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rob Tice" <rob.t...@k-int.com>
Subject uri problems
Date Tue, 29 Apr 2003 23:00:55 GMT
Hi there 

 

I am using http client as the basis for analysis of a variety of web
pages (and a vast number).

 

I have come across several patterns which cause http client problems .

 

Many of the pages I am analysing have spaces or ‘^ ‘in the query part of
the url. I have had to change the query bit set to reflect this as
http-client was blowing up with the following. 

 

org.apache.commons.httpclient.URIException: escaped query not valid

            at
org.apache.commons.httpclient.URI.setRawQuery(URI.java:3201)

            at
org.apache.commons.httpclient.URI.setEscapedQuery(URI.java:3221)

            at
org.apache.commons.httpclient.HttpMethodBase.getURI(HttpMethodBase.java:
337)

            at
com.k_int.OpenHarvest.robot.JHarvestRobot.processHttp(JHarvestRobot.java
:408)

            at
com.k_int.OpenHarvest.robot.JHarvestRobot.processNext(JHarvestRobot.java
:108)

            at
com.k_int.OpenHarvest.robot.JHarvestRobot.run(JHarvestRobot.java:725)

 

 

 

 

This is the change I made

 

    //protected static final BitSet query = uric; this was the code

 

    protected static final BitSet query = new BitSet(256); // changed
rob

    static

    {

      query.or(uric);

      query.set('^');

      query.set(0x20);

    }

 

Over to you guys :-) what do you want to do?

 

 

Regards

 

Rob Tice

Rob.tice@k-int.com

 

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message