lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "david.w.smiley@gmail.com" <david.w.smi...@gmail.com>
Subject Re: Lucene Spatial Implementation for Points within Polygon.
Date Wed, 24 Dec 2014 16:13:26 GMT
One problem is the classic x/y, lat/lon mix-up.  WKT is “x y" order, and so
are Spatial4j methods for that matter.   If you consistently made this
mistake then it might yield correct results provided the point data is
within -90 and +90 longitude.  Maybe this will do it.  Otherwise your code
appears that it should work.

If you want to construct a point then don’t create WKT, simply call
ctx.makePoint(x,y).  There isn’t a makePolygon… but you can use Spatial4j’s
JTSGeometry’s constructor which takes a JTS's “Geometry” which in turn can
be constructed from a JTS GeometryFactory.  That will avoid needless String
WKT encoding and then parsing.

For the accuracy that you clearly want, call SpatialArgs.setDistErr(0.0).

What Lucene version are you using?

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

On Wed, Dec 24, 2014 at 5:46 AM, <Ankit.Murarka@ril.com> wrote:
>
> Thanks for the suggestions David..
> However I am in a fix.. Although I am indexing and searching both using
> JTS, I am still getting very less hits. I am very sure that points which
> are indexed, falls inside lot of polygons but hits are not giving me the
> proper result.
>
> For approx. 8 lac polygons, I am getting 4.5 lacs polygons having points.
> For remaining 3.5 lacs I am not getting any HITS. Providing a small snippet
> of the code. Please suggest.
>
> I am indexing points as WKT Shape using the following Code.
>
> JtsSpatialContext spatialContext=JtsSpatialContext.GEO;
> SpatialPrefixTree grid=new GeohashPrefixTree(spatialContext,22);
> spatialStrategy=new RecursivePrefixTreeStrategy(grid,"position");
>
> Shape point = spatialContext.readShape("POINT("+lat+" "+lon+")");
> doc.add(new StoredField("FieldName",value));
> for(IndexableField f: spatialStrategy.createIndexableFields(point))
> {
> doc.add(f);
> }
>
> doc.add(new
> StoredField(spatialStrategy.getFieldName(),lat+";"+lon+";"value));
>
> indexWriter.addDocument(doc);
>
>
> For Searching, since I have polygons, I am using the following code:
>
> JtsSpatialContext spatialContext=JtsSpatialContext.GEO;
> SpatialPrefixTree grid=new GeohashPrefixTree(spatialContext,22);
> spatialStrategy=new RecursivePrefixTreeStrategy(grid,"position");
>
>
> StringBuffer to create polygons like this.
>
> POLYGON((Lat Long,Lat Long pairs))
>
> SpatialArgs args=new
> SpatialArgs(SpatialOperation.Intersects,spatialContext.readShape(StringBuffer.toString());
> ConstantScoreQuery csq=new
> ConstantScoreQuery(spatialStrategy.makeQuery(args));
>
>
> TopDocs docs=indexSearcher.search(csq,100000);
>
> If(docs.totalHits>0)
> {
> Process Data
> }
> Else
> {
> PRINT NO DATA FOUND.
> }
>
> Problem is for most of the polygons (approx. 50%) , I am getting NO DATA
> FOUND indicating no HITS. Now, I am pretty sure that there are Lat/Long
> pair's indexed which fall within the supplied polygon but I am unable to
> get all the Hits.
>
> Please help me in identifying where am I going wrong. For every incorrect
> polygon which is present(boundaries intersecting,incomplete), I am printing
> exception which is again I am excluding.. This is not the worry..
>
> Worry is I am getting very polygons which actually have points inside them.
>
> Please correct me where I am going wrong.
>
>
> -----Original Message-----
> From: david.w.smiley@gmail.com [mailto:david.w.smiley@gmail.com]
> Sent: 22 December 2014 19:19
> To: java-user@lucene.apache.org
> Subject: Re: Lucene Spatial Implementation for Points within Polygon.
>
> Hello.
>
> You have stated the use-case so generically that it’s not clear if you
> should index the polygon set and query by the point set, or the reverse.
> Generally, you should index the set that is known in-advance and then
> query by the other, the set that is generally not known.  Assuming this is
> the case, index the stable set with RecursivePrefixTreeStrategy, *and*, for
> accuracy, if that set is also the polygon set, use SerializedDVStrategy
> *or* simply keep them all in-memory keyed by an identifier (call
> JtsGeometry.index() on each as well) that you check against at runtime.
> If you don’t have enough RAM then you’ll do the former.  If neither set
> seems to be “stable”, you could really index either, definitely choose to
> index the points.  The predicate you should use is INTERSECTS; the others
> are intended for polygon against polygons (basically any non-point shape
> against another non-point shape).
>
> If your scenario is quite simply, you have a bunch of points and polygons
> you get all at once to make this computation and then that’s it (no
> long-term need to query again by the same polygons or points in the
> future), I suggest using JTS directly in-memory, and its PreparedGeometry
> to optimize each polygons, then iterate through your points to see which
> polygons they are in.  You might even use JTS's STRtree to index polygon
> bounding boxes to avoid looping over all polygons.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
> On Mon, Dec 22, 2014 at 12:30 AM, <Ankit.Murarka@ril.com> wrote:
> >
> > Hello Team,
> >
> > We are starting off with Lucene Spatial implementation for some of the
> > use
> > cases:
> >
> > A . Given "N" polygons and "M" points, find how many points lie inside
> > each of the polygon.
> >
> > 1st Approach :
> >
> > For A, we indexed Polygons using WKT and using JtsSpatial strategy. I
> > set the Level at 22 . This has resulted in huge number of terms. This
> > was needed as I need the search to be near perfect.
> >
> > For Indexing, I used Point(Supplied as WKT) using Jts again with Level
> > at
> > 22 (Although I think specifying level at query time does not make much
> > difference).
> >
> > For this, we used ""CONTAINS" .  Output is coming but I am not sure if
> > I am doing it the right way. Need suggestion.
> >
> > I am having following confusion:
> >
> > a.       Will CONTAINS and IS WITHIN both work in the same way for the
> > given scenario. I am ruling OUT INTERSECTS as that scenario is not
> > appropriate.
> >
> > b.      Second, are we missing something  in getting the correct output.
> >
> >
> > 2nd Approach : (Reversed)
> >
> > Indexed POINTS in WKT format.
> > Passed Polygons in WKT using JTs as query and fired as INTERSECTS and
> > WITHIN.
> >
> > In second approach, we are getting more output than the 1st approach.
> >
> > However, we are still not sure which is the best way to tackle this
> > problem. Please suggest.
> >
> > "Confidentiality Warning: This message and any attachments are
> > intended only for the use of the intended recipient(s).
> > are confidential and may be privileged. If you are not the intended
> > recipient. you are hereby notified that any review. re-transmission.
> > conversion to hard copy. copying. circulation or other use of this
> > message and any attachments is strictly prohibited. If you are not the
> > intended recipient. please notify the sender immediately by return
> > email.
> > and delete this message and any attachments from your system.
> >
> > Virus Warning: Although the company has taken reasonable precautions
> > to ensure no viruses are present in this email.
> > The company cannot accept responsibility for any loss or damage
> > arising from the use of this email or attachment."
> >
> "Confidentiality Warning: This message and any attachments are intended
> only for the use of the intended recipient(s).
> are confidential and may be privileged. If you are not the intended
> recipient. you are hereby notified that any
> review. re-transmission. conversion to hard copy. copying. circulation or
> other use of this message and any attachments is
> strictly prohibited. If you are not the intended recipient. please notify
> the sender immediately by return email.
> and delete this message and any attachments from your system.
>
> Virus Warning: Although the company has taken reasonable precautions to
> ensure no viruses are present in this email.
> The company cannot accept responsibility for any loss or damage arising
> from the use of this email or attachment."
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message