lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Krupansky <jack.krupan...@gmail.com>
Subject Re: Solr response error 403 when I try to index medium.com articles
Date Wed, 30 Mar 2016 00:01:24 GMT
Medium switches from http to https, so you would need the logic for dealing
with https security handshakes.

-- Jack Krupansky

On Tue, Mar 29, 2016 at 7:54 PM, Jeferson dos Anjos <
jefersonanjos@packdocs.com> wrote:

> I'm trying to index some pages of the medium. But I get error 403. I
> believe it is because the medium does not accept the user-agent solr. Has
> anyone ever experienced this? You know how to change?
>
> I appreciate any help
>
> <lst name="responseHeader">
> <int name="status">500</int>
> <int name="QTime">94</int>
> </lst>
> <lst name="error">
> <str name="msg">
> Server returned HTTP response code: 403 for URL:
>
> https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
> </str>
> <str name="trace">
> java.io.IOException: Server returned HTTP response code: 403 for URL:
>
> https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
> at sun.reflect.GeneratedConstructorAccessor314.newInstance(Unknown
> Source) at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown
> Source) at java.lang.reflect.Constructor.newInstance(Unknown Source)
> at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
> at sun.net.www.protocol.http.HttpURLConnection$10.run(Unknown Source)
> at java.security.AccessController.doPrivileged(Native Method) at
> sun.net.www.protocol.http.HttpURLConnection.getChainedException(Unknown
> Source) at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
> Source) at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
> Source) at
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown
> Source) at
> org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:87)
> at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:158)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
> at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:291)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2006) at
>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:413)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:204)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
> at org.eclipse.jetty.server.Server.handle(Server.java:368) at
>
> org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
> at
> org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
> at
> org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
> at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at
> org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
> at
> org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
> at
> org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
> at java.lang.Thread.run(Unknown Source) Caused by:
> java.io.IOException: Server returned HTTP response code: 403 for URL:
>
> https://medium.com/@producthunt/10-mac-menu-bar-apps-you-can-t-live-without-df087d2c6b1
> at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown
> Source) at
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown
> Source) at
> sun.net.www.protocol.http.HttpURLConnection.getHeaderField(Unknown
> Source) at java.net.URLConnection.getContentType(Unknown Source) at
> sun.net.www.protocol.https.HttpsURLConnectionImpl.getContentType(Unknown
> Source) at
> org.apache.solr.common.util.ContentStreamBase$URLStream.getStream(ContentStreamBase.java:84)
> ... 33 more
> </str>
> <int name="code">500</int>
> </lst>
> </response>
>
>
> Jeferson M. dos Anjos
> CEO do Packdocs
> ps.: Mantenha seus arquivos vivos com o Packdocs (www.packdocs.com)
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message