Return-Path: X-Original-To: apmail-incubator-stanbol-dev-archive@minotaur.apache.org Delivered-To: apmail-incubator-stanbol-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 324C36A39 for ; Mon, 13 Jun 2011 15:43:03 +0000 (UTC) Received: (qmail 83776 invoked by uid 500); 13 Jun 2011 15:43:03 -0000 Delivered-To: apmail-incubator-stanbol-dev-archive@incubator.apache.org Received: (qmail 83726 invoked by uid 500); 13 Jun 2011 15:43:03 -0000 Mailing-List: contact stanbol-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: stanbol-dev@incubator.apache.org Delivered-To: mailing list stanbol-dev@incubator.apache.org Received: (qmail 83716 invoked by uid 99); 13 Jun 2011 15:43:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 15:43:03 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [46.105.108.52] (HELO serveur.maven2-22.com) (46.105.108.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Jun 2011 15:42:56 +0000 Received: from alf94-15-88-173-222-100.fbx.proxad.net ([88.173.222.100] helo=[192.168.0.12]) by serveur.maven2-22.com with esmtpa (Exim 4.69) (envelope-from ) id 1QW9HS-0000Ap-0R for stanbol-dev@incubator.apache.org; Mon, 13 Jun 2011 17:42:25 +0200 Message-ID: <4DF62FCB.8020201@apache.org> Date: Mon, 13 Jun 2011 17:42:03 +0200 From: =?UTF-8?B?RmxvcmVudCBBbmRyw6k=?= Reply-To: florent@apache.org User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10 MIME-Version: 1.0 To: stanbol-dev@incubator.apache.org Subject: Re: Entityhub : Can't retrieve entity with a # References: <4DF5EF53.8090908@4sengines.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - serveur.maven2-22.com X-AntiAbuse: Original Domain - incubator.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - apache.org X-Virus-Checked: Checked by ClamAV on apache.org Hi Rupert, Thanks for testing it on your side. I invest and compare iptc configuration VS mine and found the problem ! This come from this line in indexing.properties : # the entity prefixes are used to determine if an entity needs to be searched # on a referenced site. If not specified requests for any entity will be # forwarded to this referenced site. # use ';' to seperate multiple values #org.apache.stanbol.entityhub.site.entityPrefix=http://example.org/resource;urn:mycompany: Reading this comment, I first leave it commented (not specify an entityPrefix), because reading the comment I understand that in any case, all requests go to it... and that's fine ! :) But in fact with this configuration, in Felix configuration "Apache Stanbol Entityhub Referenced Site Configuration", entity prefixes are set by default to : - http://dbpedia.org/resource/ - http://dbpedia.org/ontology/ So IMO, there may be a bug in the code, or the comment may be change. During this investigation I also "discover" theses (not closely related to this problem) : 1) There is a typo error in mapping.txt : ==> change # copy dc:titel to rdfs:label dc:titel > rdfs:label ==> to # copy dc:title to rdfs:label dc:title > rdfs:label 2) In the Felix console, when try to modify the "Apache Stanbol Entityhub Referenced Site Configuration" of an imported index. There is an ajax error on save : The request failed: [object XMLDocument] (I see this when try to modify entity prefixes of my imported index). Please ask if you prefer to have Jira tickets for this issues (if they are really ones). Thanks for you help. ++ On 06/13/2011 03:54 PM, Rupert Westenthaler wrote: > Hi florent > > Using a '#' in the URI has the disadvantages, that browsers will not > send the part behind the hash to the server because they assume, that > they need to download the whole document and navigate to the anchor > within the document. > > Using curl (or javascript) I think the full URL should be sent to the > server (was not able to find some good information about this, but at > least "curl -v" says that it sends the whole URL to the server). > However on the server side Jersey does also not provide the #{anchor} > part of the URL. > Sending >> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902" > will parse only "http://www.test.fr/terminology" to a method annotated with > > @GET > @Path("/entity") > public Response getEntity(@QueryParam(value = "id") String id) { > // get the Entity > ... > > URL encoding the '#' to '%23' causes Jersey to parse > "http://www.test.fr/terminology#entity_gradient_1306341921902". > > In this case the query for an entity with this ID is correctly parsed > to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the > indexed Entity uses '#' it should work as long as Entities are cached > locally. If a remote service is used, than the same problem of the '#' > reappears for the remote service. > > To test on my side I have done the following: > * renamed the Entities of the IPTC worldregions from > "http://cv.iptc.org/newscodes/worldregion/r001" to > "http://cv.iptc.org/newscodes/worldregion#r001" > * indexed the IPTC using the indexing tools > * installed the index to the entityhub > * curl -v "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001" > > Assuming that >> curl >> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765" >> ==> answer is >> Entity with ID >> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found >> an any referenced site >> > happend on a referenced site with a full cache (e.g. as created by the > Indexing Utility. I was not able to reproduce the Error. If the > referenced site uses a remote service to dereferenced entity ids (e.g. > the Cool URI) this might happen. In this case I suggest to directly > test the remote service. > > best > Rupert Westenthaler > > > On Mon, Jun 13, 2011 at 1:06 PM, florent andré > wrote: >> Hi Rupert, >> Hope you are fine. >> >> I have another problem... >> In my skos, entity are identify by an #, like this : >> >> > rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902"> >> > rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/> >> GRADIENT >> > rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/> >> >> >> >> And I can't arrive to find the entity with the entity endpoint. >> >> * With the # char : >> curl >> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902" >> ==> answer is >> Entity with ID 'http://www.test.fr/terminology' not found an any referenced >> site >> >> ==> the part after the # is remove >> >> * With replacement of the # by %23 (the urlencode equivalent) : >> curl >> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765" >> ==> answer is >> Entity with ID >> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found >> an any referenced site >> >> ==> all the id is keep, but still not found... >> The result is the same if I urlencode all the entity id. >> >> This is related to a bug or something I do wrong ? >> >> Thanks. >> ++ >> >> > > >