incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <>
Subject Re: Implementing GeoSPARQL
Date Tue, 03 Apr 2012 02:04:04 GMT
Hi Paolo,

On Apr 2, 2012, at 2:01 AM, Paolo Castagna wrote:

> Hi Chris
> Mattmann, Chris A (388J) wrote:
>>> In my head, SIS (which I do not know very well) is a low-level geo indexing
>>> library which could be used to provide the indexing capability for a GeoSPARQL
>>> implementation.
>> Yeah that's what I was thinking too.
> Ok.
> I've played with Lucene spatial capabilities for this sort of things in the
> past. My knowledge of Apache SIS is very limited. In particular, it is not
> clear to me how/when things are persisted on disk. My impression is that SIS
> load the entire index in RAM when it starts and serializes it out at the end.
> Am I right? (I hope not. :-))

Well, what's the end? Basically SIS does in fact build a Quad Tree index, and
it has the capability to store it to disk and update it when requested. SIS is
both an API and web service at the moment, and it can be improved significantly,
that's for sure, but it's functional for now.

> If that is the case, it could be an issue for large indexes.

Possibly -- not sure -- I guess we'll find out. If it falls down, it can always be
made more scalable, it's just a basic spatial data structure.

BTW, SIS started out by importing the code from Lucene spatial which
Patrick O'Leary one of the creators of SIS, wrote. Patrick was the creator
of Local Lucene and Local Solr, the first appearance of these functionalities
in Lucene.

>>> I know that ARQ (i.e. the SPARQL query engine available in Jena) can
>>> provide you with a SPARQL 1.1 engine and extension points to use other
>>> custom indexes (such as SIS in this case).
>>> What exactly do you mean with "integrating with Any23"?
>>> Do you mean crawling the web and extract lat/long from web pages?
>> Yep that's what I was thinking -- maybe doing it in Any23, and/or Tika.
> Doing a web crawl to extract locations out of web pages using Any23|Tika seems
> quite an useful thing for certain use cases.
> In other scenarios people might already have a large dataset with locations in
> it or people might want to leverage datasets such as Geonames, Freebase,
> DBPedia, Yahoo GeoPlanet, etc. so crawling in these use cases is less important.

Yeah potentially. But representing those geo locations and coordinates in a common
way is important.

>>> Where will you store those RDF statements?
>> It looks like Any23 would store to Sesame -- is that the case?
> Probably.
> It would be nice to have pluggable RDF stores in Any23, but this is another
> story: :-)

Yep, thanks for filing that. I'm sure someone will get around to it :)

>>> How can you implement the GeoSPARQL spec without (re)using a SPARQL
>>> query engine (such as ARQ)?
>> I need that too :) I just don't understand it as well (and understand the Any23/Tika
>> and SIS part better). I'll have to learn Jena it looks like though, you game to 
>> help me out?
> A very old prototype which shows you how you can extend ARQ is here:

Awesome thanks for the link. I'll check it out.

> It is just a prototype and it is using ARQ's property functions rather than
> filter functions (and it is using Lucene spatial rather than SIS). But, it
> is IMHO a good starting point to see how you could have ARQ using a custom
> index to perform spatial searches.


> The reason why at the time I used Lucene spatial is because that was the only
> alternative (non (L)GPL) I found (I did not know about SIS at the time).

No worries. We created SIS b/c we didn't think this functionality was unique
to Lucene and we wanted an ALv2 licensed toolkit. It's been around since
February 2010. See here:

> The reason why I did not implemented GeoSPARQL is simplicity, I wanted just a
> proof of concept and the most important use case IMHO is searching things around
> a point and returning results sorted by distance.
> For GeoSPARQL (which I need to go back an read properly) do we need custom
> FILTER functions or property functions (or both)?

I guess I need to read the spec more to find out :)

>>> IMHO geo location (as well as free text) are two SPARQL extensions which
>>> are very useful in loads of use cases.
>> Yep I'm super excited to get this implemented. You interested in helping? I think
>> we can bring together Tika, Any23, Jena and SIS here...
> I am interested in learning more about SIS, I have no idea at the moment on how
> much effort is necessary to implement GeoSPARQL and if that spec is going to be
> implemented elsewhere by other RDF stores.

Well even if it is implemented elsewhere by other RDF stores, it wouldn't stop me
from implementing something in SIS. I think it's a great use case for SIS and something
of use to the broader community. The fact that we can't name a ton of other RDF
stores that implement GeoSPARQL is an indication to me that it's not supported
or in widespread use -- so the time is ripe I guess.

> At the moment, I cannot put much effort on this. But, if something similar a la
> LARQ and/or GeoARQ is useful and I can help, I'll do it.
> I see two main use cases here:
> 1. Crawling the web and build a dataset of statements with locations.
> 2. Indexing a dataset with statements with locations and extend SPARQL to
>    perform queries over it.
> For 1. you need Any23|Tika (and a crawler) and, eventually, an RDF store.
> For 2. you need SIS and a SPARQL query engine (which of course uses an RDF store).
> If I were trying to implement GeoSPARQL, I would start with 2., SIS and ARQ.

Thanks, will do.


Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA

View raw message