lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject Re: Geographical indexing in Lucene
Date Fri, 05 Oct 2007 10:31:39 GMT
Hi Evgeny,

you may look at http://www.panFMP.org

This software uses a similar approach for very fast range queries without
modifying Lucene. It works by storing the double values in a special encoded
form with different precisions in the index, similar to the well known
TRIEs.
It may be very interesting to compare your algorithm with mine!

Nice work,

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

Hello,

As part of my MSc project this summer I developed geoLucene, a
modified version of Lucene (based on 2.3-dev checked out on 05.07)
that can index geographical data using R-trees. It has been shown to
be faster on geographic queries than the unmodified Lucene. The code
is hosted at https://sourceforge.net/projects/geolucene/.

This is an alpha release and it is not intended for production use.
However, I hope somebody will find it useful and will develop it
further. Though the code is far from being finished, I'm unable to
continue developing this project.

geoLucene is able to index documents in two indexes. First, a document
is indexed using inverted index, as usual. Second, it may be (if
developer wishes so) indexed using a spacial index, an R-tree. The
spacial index can be searched and the results may be combined with the
results from the inverted index.

I'll explain how it works by providing the code. So, the search is
done as follows:

GeoSearchArea gsa = new GeoSearchArea();
gsa.setWest(x1);
gsa.setEast(x2);
gsa.setSouth(y1);
gsa.setNorth(y2);
GeoQuery geoQuery = new GeoQuery(gsa, searchMode); // geoQuery extends
Query, search mode is intersection or enclosure
IndexSearcher RTreeSearcher = new IndexSearcher(dir);
Hits rTreeResults = RTreeSearcher.search(geoQuery);

The data is indexed as follows:

Document doc = new Document();
doc.add(new Field("latitude", Double.toString(lat), Field.Store.NO,
Field.Index.UN_TOKENIZED));
doc.add(new Field("longitude", Double.toString(lon), Field.Store.NO,
Field.Index.UN_TOKENIZED));
indexWriter.addDocument(doc);

Here, "latitude" and "longitude" are hard-coded (recompile to change)
field names that should contain geocoordinates. The documents that
contain such coordinates are intercepted and indexed in an R-tree
before being indexed in the inverted index.

This project was a part of my MSc course. More details about the
architecture, implementation and the performance results may be found
in my MSc thesis available online:
http://www.doc.ic.ac.uk/~es106/thesis/MultidimensionalIndexingInLucene.pdf.
There is a web demo, which is available at
http://geolucene.virtual.vps-host.net/ (IE, Firefox and latest
Konqueror only). The web-demo runs in a multitasking environment
(furthermore, the data is cached), so make a number of queries before
averaging the search times.

Examples of how to index and search files may be found in
TestGeoSearchSpeed.java file (in the root directory of SVN
repository). I used this file to run benchmarks.

This project is still in alpha and there are many changes to be made
before it can be called stable but I must warn you that it lacks one
essential feature. I did not implement document deletion method.
However, there is a not so complicated way of implementing it
(preserving coherent document numbers in both indexes). It is
described in the "Future Work" section of my thesis. Therefore, this
version of the library can be tested to evaluate the performance of an
R-tree (it's significantly faster than inverted index) only.

As I've said in the beginning, I am unable to develop this project
anymore, so I release it on SourceForge and I'm willing to pass the
code to anybody who wants to improve it. There is no documentation but
my thesis is a quite good (I hope) explanation of what's going on
inside geoLucene. I'm happy to answer any queries and generally help
with the project, if somebody wishes to develop it further.

Regards,
Evgeny Shadchnev

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message