lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgeny Shadchnev" <>
Subject Geographical indexing in Lucene
Date Mon, 01 Oct 2007 15:40:51 GMT

As part of my MSc project this summer I developed geoLucene, a
modified version of Lucene (based on 2.3-dev checked out on 05.07)
that can index geographical data using R-trees. It has been shown to
be faster on geographic queries than the unmodified Lucene. The code
is hosted at

This is an alpha release and it is not intended for production use.
However, I hope somebody will find it useful and will develop it
further. Though the code is far from being finished, I'm unable to
continue developing this project.

geoLucene is able to index documents in two indexes. First, a document
is indexed using inverted index, as usual. Second, it may be (if
developer wishes so) indexed using a spacial index, an R-tree. The
spacial index can be searched and the results may be combined with the
results from the inverted index.

I'll explain how it works by providing the code. So, the search is
done as follows:

GeoSearchArea gsa = new GeoSearchArea();
GeoQuery geoQuery = new GeoQuery(gsa, searchMode); // geoQuery extends
Query, search mode is intersection or enclosure
IndexSearcher RTreeSearcher = new IndexSearcher(dir);
Hits rTreeResults =;

The data is indexed as follows:

Document doc = new Document();
doc.add(new Field("latitude", Double.toString(lat), Field.Store.NO,
doc.add(new Field("longitude", Double.toString(lon), Field.Store.NO,

Here, "latitude" and "longitude" are hard-coded (recompile to change)
field names that should contain geocoordinates. The documents that
contain such coordinates are intercepted and indexed in an R-tree
before being indexed in the inverted index.

This project was a part of my MSc course. More details about the
architecture, implementation and the performance results may be found
in my MSc thesis available online:
There is a web demo, which is available at (IE, Firefox and latest
Konqueror only). The web-demo runs in a multitasking environment
(furthermore, the data is cached), so make a number of queries before
averaging the search times.

Examples of how to index and search files may be found in file (in the root directory of SVN
repository). I used this file to run benchmarks.

This project is still in alpha and there are many changes to be made
before it can be called stable but I must warn you that it lacks one
essential feature. I did not implement document deletion method.
However, there is a not so complicated way of implementing it
(preserving coherent document numbers in both indexes). It is
described in the "Future Work" section of my thesis. Therefore, this
version of the library can be tested to evaluate the performance of an
R-tree (it's significantly faster than inverted index) only.

As I've said in the beginning, I am unable to develop this project
anymore, so I release it on SourceForge and I'm willing to pass the
code to anybody who wants to improve it. There is no documentation but
my thesis is a quite good (I hope) explanation of what's going on
inside geoLucene. I'm happy to answer any queries and generally help
with the project, if somebody wishes to develop it further.

Evgeny Shadchnev

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message