stanbol-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupert Westenthaler <rupert.westentha...@gmail.com>
Subject Re: Entityhub : Can't retrieve entity with a #
Date Mon, 13 Jun 2011 16:38:55 GMT
On Mon, Jun 13, 2011 at 5:53 PM, Florent André <florent@apache.org> wrote:
> Yep,
>
> And for continue about the "# case", I observe this "strange" thing :
>
> when request with # or %23 : I always have good metadatas values, but
> - with # : representation field is not good
> - With %23 representation field is ok
>

For referencedSites metadata are generated automatically based on
metadata defined for the site (e.g. copyright, attribution, cache
status ...).
The type "foaf:Document" is used as rdf:type for Metadata. The
"dc:subject" relation is currently used to link metadata with the
entity. However this is already changed in my local version to
"entityhub:about" because it caused problems with entities that also
defined this property.

> (Note : I use a full cached referenced site create with indexing utility.)
>
> Request and answer details :
>
> A) When requested with #
> $ curl
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
> {
>    "id": "http:\/\/www.test.fr\/terminology",
>    "site": "gasoil",
>    "representation": {"id": "http:\/\/www.test.fr\/terminology"},
>    "metadata": {
>        "id": "http:\/\/www.test.fr\/terminology.meta",
>        "http:\/\/www.iks-project.eu\/ontology\/rick\/model\/isChached": [{
>            "type": "value",
>            "value": "true"
>        }],
>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>            "type": "reference",
>            "value": "http:\/\/xmlns.com\/foaf\/0.1\/Document"
>        }],
>        "http:\/\/purl.org\/dc\/terms\/subject": [{
>            "type": "reference",
>            "value": "http:\/\/www.test.fr\/terminology"
>        }]
>    }
> }
>
As noted in the first response everything after the '#' gets ignored.
Therefore this request returns the entity with the id
"http:\/\/www.test.fr\/terminology". It looks like that this entity
actually exists, but does not define any data. Most likely because
this URI is referenced in your SKOS file and is therefore returned by
the Triple Store as "entity" while indexing.

>
> ============
> B) when requested with %23
>
> $ curl
> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.f/terminology%23entity_gradient_1306341921902"
> {
>    "id": "http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902",
>    "site": "gasoil",
>    "representation": {
>        "id":
> "http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902",
>        "http:\/\/www.w3.org\/2004\/02\/skos\/core#broader": [{
>            "type": "reference",
>            "value":
> "http:\/\/www.test.fr\/terminology#entity_operateur_mathematique_1306341918995"
>        }],
>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>            "type": "reference",
>            "value": "http:\/\/www.w3.org\/2004\/02\/skos\/core#Concept"
>        }],
>        "http:\/\/www.w3.org\/2004\/02\/skos\/core#inScheme": [{
>            "type": "reference",
>            "value":
> "http:\/\/www.test.fr\/terminology#space_mathematiques_1306341820765"
>        }],
>        "http:\/\/www.w3.org\/2000\/01\/rdf-schema#label": [{
>            "type": "text",
>            "value": "GRADIENT"
>        }],
>        "http:\/\/www.w3.org\/2004\/02\/skos\/core#prefLabel": [{
>            "type": "text",
>            "value": "GRADIENT"
>        }]
>    },
>    "metadata": {
>        "id":
> "http:\/\/www.test.fr\/terminology#entity_gradient_1306341921902.meta",
>        "http:\/\/www.iks-project.eu\/ontology\/rick\/model\/isChached": [{
>            "type": "value",
>            "value": "true"
>        }],
>        "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type": [{
>            "type": "reference",
>            "value": "http:\/\/xmlns.com\/foaf\/0.1\/Document"
>        }],
>        "http:\/\/purl.org\/dc\/terms\/subject": [{
>            "type": "reference",
>            "value":
> "http:\/\/www.edf.fr\/terminology#entity_gradient_1306341921902"
>        }]
>    }
> }

This is the actual entity as requested.


BTW:

Rather than using the ReferencedSiteManager

    http://localhost:8080/entityhub/sites/entity?id={id}

it would be better to directly use the ReferencedSite

    http://localhost:8080/entityhub/site/{siteId}/entity?id={id}

because if you would have other ReferencedSites that do not define
Entity prefixes that the Requests would be actually sent to more than
one site before answered.
If one knows what site do hold the searched entity, than it is always
better to use directly this site.

best
Rupert Westenthaler


>
>
> ++
>
>
> On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
>>
>> Hi florent
>>
>> Using a '#' in the URI has the disadvantages, that browsers will not
>> send the part behind the hash to the server because they assume, that
>> they need to download the whole document and navigate to the anchor
>> within the document.
>>
>> Using curl (or javascript) I think the full URL should be sent to the
>> server (was not able to find some good information about this, but at
>> least "curl -v" says that it sends the whole URL to the server).
>> However on the server side Jersey does also not provide the #{anchor}
>> part of the URL.
>> Sending
>>>
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>
>> will parse only "http://www.test.fr/terminology" to a method annotated
>> with
>>
>>     @GET
>>     @Path("/entity")
>>     public Response getEntity(@QueryParam(value = "id") String id) {
>>         // get the Entity
>>         ...
>>
>> URL encoding the '#' to '%23' causes Jersey to parse
>> "http://www.test.fr/terminology#entity_gradient_1306341921902".
>>
>> In this case the query for an entity with this ID is correctly parsed
>> to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
>> indexed Entity uses '#' it should work as long as Entities are cached
>> locally. If a remote service is used, than the same problem of the '#'
>> reappears for the remote service.
>>
>> To test on my side I have done the following:
>> * renamed the Entities of the IPTC worldregions from
>> "http://cv.iptc.org/newscodes/worldregion/r001" to
>> "http://cv.iptc.org/newscodes/worldregion#r001"
>> * indexed the IPTC using the indexing tools
>> * installed the index to the entityhub
>> * curl -v
>> "http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
>>
>> Assuming that
>>>
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>> happend on a referenced site with a full cache (e.g. as created by the
>> Indexing Utility. I was not able to reproduce the Error. If the
>> referenced site uses a remote service to dereferenced entity ids (e.g.
>> the Cool URI) this might happen. In this case I suggest to directly
>> test the remote service.
>>
>> best
>> Rupert Westenthaler
>>
>>
>> On Mon, Jun 13, 2011 at 1:06 PM, florent andré
>> <florent.andre-dev@4sengines.com>  wrote:
>>>
>>> Hi Rupert,
>>> Hope you are fine.
>>>
>>> I have another problem...
>>> In my skos, entity are identify by an #, like this :
>>>
>>>  <rdf:Description
>>> rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
>>>    <skos:broader
>>>
>>> rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
>>>    <skos:prefLabel>GRADIENT</skos:prefLabel>
>>>    <skos:inScheme
>>>
>>> rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
>>>    <rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
>>>  </rdf:Description>
>>>
>>> And I can't arrive to find the entity with the entity endpoint.
>>>
>>> * With the # char :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
>>> ==>  answer is
>>> Entity with ID 'http://www.test.fr/terminology' not found an any
>>> referenced
>>> site
>>>
>>> ==>  the part after the # is remove
>>>
>>> * With replacement of the # by %23 (the urlencode equivalent) :
>>> curl
>>>
>>> "http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
>>> ==>  answer is
>>> Entity with ID
>>> 'http://www.test.fr/terminology#space_mathematiques_1306341820765' not
>>> found
>>> an any referenced site
>>>
>>> ==>  all the id is keep, but still not found...
>>> The result is the same if I urlencode all the entity id.
>>>
>>> This is related to a bug or something I do wrong ?
>>>
>>> Thanks.
>>> ++
>>>
>>>
>>
>>
>>
>



-- 
| Rupert Westenthaler             rupert.westenthaler@gmail.com
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Mime
View raw message