lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: GIS
Date Sun, 31 Oct 2004 17:47:25 GMT
A colleague of mine just remarked that the indexing problem for
geographical retrieval is a solved problem.  One algorithm is specified
in this book, Machine Learning by Tom Mitchell:
http://www.amazon.com/exec/obidos/ASIN/0070428077/qid=1099244886/sr=2-1/
ref=pd_ka_b_2_1/102-6518692-8636163

This algorithm in question is a version of the k-nearest neighbor
problem, which my colleague has seen implemented for search in at least
one commercial company.

It's an expensive book -- it's likely the algorithm could be found via
online searches for free.  Also, there are probably technical
discussions and specs in more specialized geographical information
system literature.

Chuck

  > -----Original Message-----
  > From: Chuck Williams [mailto:chuck@manawiz.com]
  > Sent: Sunday, October 31, 2004 9:41 AM
  > To: Lucene Developers List
  > Cc: scoob@mindinmotiontech.com
  > Subject: RE: GIS
  > 
  > I for one would love to have this functionality, i.e. would use it
  > immediately if available and efficient.  It seems the biggest
problem is
  > how you are going to index the information.  If you store and index
the
  > latitude and longitude for a geographically-positioned document, and
  > then want to find all such documents with a spherical rectangle or
  > circle, how do you find the candidates?  As far as I know, Lucene
does
  > range searches now by expanding a range into a list of all possible
  > values within that range.  This is clearly not a reasonable approach
for
  > latitudes and longitudes, assuming you need precision on the values
  > (which I do).  There are potentially reasonable indexing approaches
that
  > occur to me (e.g. in addition to precise lat/lon store with each
object
  > its grid label in a few different discrete lat/lon grids, or use a
  > b-tree index of some kind), but this is probably a solved problem
  > somewhere in the field of geographical information systems.
  > 
  > After the indexing, the next interesting question would seem to be
the
  > scoring, although this seems a much simpler issue.  E.g., a score
  > related to the distance from the center of the query region would
seem
  > to be appropriate.  There should be a mechanism analogous to the
current
  > coord so that this could be tuned or turned off, depending on the
needs
  > of particular queries within the application.
  > 
  > My $0.02,
  > 
  > Chuck
  > 
  >   > -----Original Message-----
  >   > From: Guillermo Payet [mailto:gpayet@localharvest.org]
  >   > Sent: Sunday, October 31, 2004 9:34 AM
  >   > To: lucene-dev@jakarta.apache.org
  >   > Cc: scoob@mindinmotiontech.com
  >   > Subject: GIS
  >   >
  >   > Hello,
  >   >
  >   > I'm new here, so first of all I'd like to say hello to everyone.
  >   >
  >   > So, hi there...
  >   >
  >   > I just spent two days trying to get Lucene to handle
"geographically
  >   > constricted" searches for our website. (Check out
  > www.localharvest.org)
  >   >
  >   > I got close, but no cigar. (it works, but is very slow)
  >   >
  >   > We need to be able to do searches only within a geographicaly
  > limited
  >   > set of documents.  (In this case, our member listings)
  >   >
  >   > So... I'd like to volunteer to add the needed functions in
Lucene
  >   > to:
  >   >
  >   >   - build a LatLonField class for geographical coordinates
  >   >   - build a LatLonRectTerm (or whatever) to define matches
  >   >     within a latitude/longituded defined rectangle.
  >   >   - build a LatLonRadiusTerm (or whatever) to define all matches
  >   >     within X distance from a point (lat,lon).
  >   >
  >   > We're now doing all of this through MySQL, which works "ok", but
  > leaves
  >   > a lot to be desired for the relevance of search results for a
lot of
  >   > searches.  I've already written all the spherical trig functions
to
  >   > to do these searches accurately, and I'd love to port them into
  >   > Lucene.
  >   >
  >   > So my questions are:
  >   >
  >   >  - Has there been any talk about doing this before?
  >   >  - Is this a bad idea for any reason?
  >   >  - What would be the right approach to do this?
  >   >
  >   > The fact that Lucene stores and indexes (or seems it seems) all
  > terms
  >   > as Strings and that there is no NumericTerm makes me think that
I
  >   > might be missing something and that this migh be a much bigger
deal
  >   > than I think?
  >   >
  >   > 	--G
  >   >
  >   >
  >   >
  >   >
  >   > --
  >   > Guillermo Payet
  >   > L O C A L  H A R V E S T
  >   > http://www.localharvest.org
  >   >
  >   > Every Morning I awake torn between a desire to save the world
and
  >   > an inclination to savor it.  This makes it hard to plan the day.
  >   >
  >   >
-E.B.White
  >   >
  >   >
  >   >
  >
---------------------------------------------------------------------
  >   > To unsubscribe, e-mail:
lucene-dev-unsubscribe@jakarta.apache.org
  >   > For additional commands, e-mail:
lucene-dev-help@jakarta.apache.org
  > 
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
  > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message