stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Florent André <flor...@apache.org>
Subject Re: Entityhub : Can't retrieve entity with a #
Date Mon, 13 Jun 2011 15:42:03 GMT
Hi Rupert,

Thanks for testing it on your side.

I invest and compare iptc configuration VS mine and found the problem !

This come from this line in indexing.properties :
# the entity prefixes are used to determine if an entity needs to be 
searched
# on a referenced site. If not specified requests for any entity will be
# forwarded to this referenced site.
# use ';' to seperate multiple values
#org.apache.stanbol.entityhub.site.entityPrefix=http://example.org/resource;urn:mycompany:

Reading this comment, I first leave it commented (not specify an 
entityPrefix), because reading the comment I understand that in any 
case, all requests go to it... and that's fine ! :)

But in fact with this configuration, in Felix configuration "Apache 
Stanbol Entityhub Referenced Site Configuration", entity prefixes are 
set by default to :
- http://dbpedia.org/resource/
- http://dbpedia.org/ontology/

So IMO, there may be a bug in the code, or the comment may be change.

During this investigation I also "discover" theses (not closely related 
to this problem) :

1) There is a typo error in mapping.txt :
==> change
# copy dc:titel to rdfs:label
dc:titel > rdfs:label
==> to
# copy dc:title to rdfs:label 
                           dc:title > rdfs:label

2) In the Felix console, when try to modify the "Apache Stanbol 
Entityhub Referenced Site Configuration" of an imported index.
There is an ajax error on save :
The request failed:
[object XMLDocument]

(I see this when try to modify entity prefixes of my imported index).

Please ask if you prefer to have Jira tickets for this issues (if they 
are really ones).

Thanks for you help.
++


On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
> Hi florent
>
> Using a '#' in the URI has the disadvantages, that browsers will not
> send the part behind the hash to the server because they assume, that
> they need to download the whole document and navigate to the anchor
> within the document.
>
> Using curl (or javascript) I think the full URL should be sent to the
> server (was not able to find some good information about this, but at
> least "curl -v" says that it sends the whole URL to the server).
> However on the server side Jersey does also not provide the #{anchor}
> part of the URL.
> Sending
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
> will parse only "http://www.test.fr/terminology" to a method annotated with
>
>      @GET
>      @Path("/entity")
>      public Response getEntity(@QueryParam(value = "id") String id) {
>          // get the Entity
>          ...
>
> URL encoding the '#' to '%23' causes Jersey to parse
> "http://www.test.fr/terminology#entity_gradient_1306341921902".
>
> In this case the query for an entity with this ID is correctly parsed
> to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
> indexed Entity uses '#' it should work as long as Entities are cached
> locally. If a remote service is used, than the same problem of the '#'
> reappears for the remote service.
>
> To test on my side I have done the following:
> * renamed the Entities of the IPTC worldregions from
> "http://cv.iptc.org/newscodes/worldregion/r001" to
> "http://cv.iptc.org/newscodes/worldregion#r001"
> * indexed the IPTC using the indexing tools
> * installed the index to the entityhub
> * curl -v "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
>
> Assuming that
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>> ==>  answer is
>> Entity with ID
>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
>> an any referenced site
>>
> happend on a referenced site with a full cache (e.g. as created by the
> Indexing Utility. I was not able to reproduce the Error. If the
> referenced site uses a remote service to dereferenced entity ids (e.g.
> the Cool URI) this might happen. In this case I suggest to directly
> test the remote service.
>
> best
> Rupert Westenthaler
>
>
> On Mon, Jun 13, 2011 at 1:06 PM, florent andré
> <florent.andre-dev@4sengines.com>  wrote:
>> Hi Rupert,
>> Hope you are fine.
>>
>> I have another problem...
>> In my skos, entity are identify by an #, like this :
>>
>>   <rdf:Description
>> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>>     <skos:broader
>> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>>     <skos:prefLabel>GRADIENT</skos:prefLabel>
>>     <skos:inScheme
>> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>>     <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>>   </rdf:Description>
>>
>> And I can't arrive to find the entity with the entity endpoint.
>>
>> * With the # char :
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>> ==>  answer is
>> Entity with ID 'http://www.test.fr/terminology' not found an any referenced
>> site
>>
>> ==>  the part after the # is remove
>>
>> * With replacement of the # by %23 (the urlencode equivalent) :
>> curl
>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>> ==>  answer is
>> Entity with ID
>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
>> an any referenced site
>>
>> ==>  all the id is keep, but still not found...
>> The result is the same if I urlencode all the entity id.
>>
>> This is related to a bug or something I do wrong ?
>>
>> Thanks.
>> ++
>>
>>
>
>
>

Mime
View raw message