lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <mar...@rectangular.com>
Subject Re: [spatial] Cartesian "Tiers" nomenclature
Date Tue, 29 Dec 2009 23:55:22 GMT
On Tue, Dec 29, 2009 at 12:29:47PM -0800, Marvin Humphrey wrote:

> In Lucyland, we've adopted a tradition of recording "brainlogs"
> while browsing unfamiliar documentation as a form of UI testing -- I'll do one
> of those later.  

OK, here's the brainlog I recorded while trying to figure out how spatial
contrib works.

[ BEGIN BRAINLOG ]

[ surf to contrib-spatial Javadocs for Lucene 3.0 ]

    "Support for filtering based upon geographic location." 

OK, I assume that means we can match a tile and create a posting list for it,
then AND the resulting doc id set against other search results.

No sample code.  Looks like pure reference documentation rather than tutorial
style documentation.

[ Click on Package org.apache.lucene.spatial.tier ]

Lots of red text -- guess they're serious about this not being a stable API.

Not clear what to click on next, I'll try DistanceQueryBuilder since Patrick
mentioned that.

[ Click on Class DistanceQueryBuilder ]

Documentation is sparse.  The only real meat is in the method documetation: a
single sentence plus parameter names.

I don't know GeoHashes.  I sort of think I understand tierFieldPrefix. (?)
And what is "needPrecise"?

Hmm, maybe I really need to go find a tutorial somewhere.  Let's try the
wiki...

[ Go to Lucene Java wiki, search for "spatial", get two hits: SpatialLucene,
    SpatialSearch. ]

[ "SpatialLucene" wiki page ]

Hmm, there's a big warning which says "refers to content not yet committed"...
is that true?  Nope, LUCENE-1387 is closed, so this wiki page is out of date.
Pff, whatever...

OK, I see links to Patrick's white paper.  Seems like it will probably be
heavier than I want.

[ "SpatialSearch" wiki page ]

Lots of GeoHash links, should be handy when I try to learn that.  And another
link to Patrick's whitepaper for the cartesian stuff.

I'll try the "full text" search the Wiki search recommends.  

[ Search Lucene Java wiki for "spatial", this time as "full text" ]

Bleah, the Wiki search's performance sucked: "Results 1 - 6 of 6 results out of
about 1005 pages. (9.49 seconds)".  No interesting results.

I'm reluctant to look at a PDF white paper, it will probably be too technical.

[ google "lucene spatial tutorial" ]

Crap, first hit is an article on Hibernate/Lucene integration.  I just want
"how to use Lucene spatial".  

Looks like Lucid's got a webinar from Grant, but I don't feel like sitting
down for an hour, I just want some frikkin' sample code.

[ google "lucene spatial" ]

Blog post at
http://sujitpal.blogspot.com/2008/02/spatial-search-with-lucene.html looks
promising... Crap, it's not using the spatial contrib. :(

OK, there's stuff from Mike McCandless at
<http://www.manning-sandbox.com/thread.jspa?threadID=35203&tstart=0>...
naturally all the indentation is stripped.  :(  I'll look anyway...  OK, I
think I basically follow that despite the sucky formatting, but it's not easy.

Guess I'm really stuck reading a white paper. :(

[ Surf to Patrick's white paper at 
http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html ]

Jeez, that's a lot shorter than I expected.  Formatting's all messed up and I
see some Unicode replacement character glyphs, guess Patrick's not a "web guy"
;) ... But it's probably what I want in terms of content and depth. 

[ read through first section ]

Inclusion, reductionism... sure sure, that's easy enough, it's just query
optimization like I deal with all the time.

But wait, no code samples.  Dammit. :(

[ read through "Boundary Box" ]

Jeeze, the "giant cross" approach to finding intersection of lat/lon was
actually part of a formal spatial package in the past?

[ read through "Cartesian Grid" ]

The formatting for this section is really messed up.

OK, *finally* I see that a cartesian tier is in fact a zoom level.  And even
Patrick uses the word "grid" extensively.

Turns out that algorithmically speaking, local Lucene works almost exactly
like I expected it to.

I wonder if it's faster to filter by saving the doc ids to a bitset first and
filtering off of that, or if it's faster just to use an ANDQuery to join the
result set from the matching tiles and the result set for the rest of the
query.

[ read through "Box ID's" ]

Do I really need to know any of this?  Box/Tile ID names are arbitrary.  Only
the query builder that figures out which boxes match a given geographical
constraint needs to know.

[ END BRAINLOG ]

Conclusion # 1: 

I don't spend a lot of time immersed in Java culture so maybe I missed
something, but there seems to be a dearth of high-quality tutorial-style
documentation for spatial contrib.

I'll save conclusion #2 for a separate email.

Marvin Humphrey


Mime
View raw message