Return-Path: X-Original-To: apmail-any23-dev-archive@www.apache.org Delivered-To: apmail-any23-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D09701083D for ; Tue, 16 Jul 2013 21:58:52 +0000 (UTC) Received: (qmail 23256 invoked by uid 500); 16 Jul 2013 21:58:50 -0000 Delivered-To: apmail-any23-dev-archive@any23.apache.org Received: (qmail 23222 invoked by uid 500); 16 Jul 2013 21:58:50 -0000 Mailing-List: contact dev-help@any23.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@any23.apache.org Delivered-To: mailing list dev@any23.apache.org Received: (qmail 23177 invoked by uid 99); 16 Jul 2013 21:58:50 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jul 2013 21:58:50 +0000 Date: Tue, 16 Jul 2013 21:58:49 +0000 (UTC) From: "Lewis John McGibbney (JIRA)" To: dev@any23.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ANY23-165) "Invalid content" error if TITLE precedes encoding declaration in the document MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/ANY23-165?page=3Dcom.atlassian.= jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D13710= 340#comment-13710340 ]=20 Lewis John McGibbney commented on ANY23-165: -------------------------------------------- Hi Andrey, I wonder if you are able to look in to the code. From a developm= ent perspective, we are relatively thinly applied here. I for one have not = come across this problem to date so will not be jumping to address it. If y= ou were able to provide a patch it would be really great. Thank you=20 =20 > "Invalid content" error if TITLE precedes encoding declaration in the doc= ument > -------------------------------------------------------------------------= ----- > > Key: ANY23-165 > URL: https://issues.apache.org/jira/browse/ANY23-165 > Project: Apache Any23 > Issue Type: Bug > Components: encoding > Affects Versions: 0.8.0 > Environment: Linux 2.6.18-308.11.1.el5 #1 SMP Tue Jul 10 08:48:43= EDT 2012 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Andrey Kutuzov > Labels: encoding > Fix For: 0.9.0 > > Attachments: kinopoisk.html.gz > > > When any23 is asked to extract semantics from a web document which is not= in UTF-8 and where TITLE precedes encoding declaration, any23 fails with e= rror "Invalid content '" > Example of such an URL: > http://www.kinopoisk.ru/film/565993/ > Compressed dump of this page is attached. > any23 http://www.kinopoisk.ru/film/565993/ > SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". > SLF4J: Defaulting to no-operation (NOP) logger implementation > SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further= details. > ------------------------------------------------------------------------ > Apache Any23 :: rover > ------------------------------------------------------------------------ > @prefix dcterms: . > dcterms:title "=C3=8F=C3=A8=C3=B0= =C3=A0=C3=AD=C3=BC=C3=A8 3DD" . > ------------------------------------------------------------------------ > Apache Any23 FAILURE > Execution terminated with errors: Invalid content '' > Total time: 1s > Finished at: Mon Jul 15 20:31:14 MSK 2013 > Final Memory: 67M/479M > ------------------------------------------------------------------------ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs For more information on JIRA, see: http://www.atlassian.com/software/jira