lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Rodriguez <arodrig...@spark.net>
Subject lucene.net 3.0.3 indexing spatial too slow
Date Tue, 19 Mar 2013 01:10:39 GMT
I have recently upgraded my search code from lucene.net 2.9.4 to 3.0.3. I have noticed a change
in the spatial packages and have updated my code accordingly. One drawback from the upgrade
that I have noticed is much slower index times. Through process of elimination, I have been
able to narrow the slowness down to the new spatial code that indexes the lat/long coordinates:
public void AddLocation (double lat, double lng)
    {
        try
        {
            string latLongKey = lat.ToString() + "," + lng.ToString();
            AbstractField[] shapeFields = null;
            Shape shape = null;
            if (HasSpatialShapes(latLongKey))
            {
                shape = SpatialShapes[latLongKey];
            }
            else
            {
                if (this.Strategy is BBoxStrategy)
                {
                    shape = Context.MakeRectangle(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLonDEG(lng),
DistanceUtils.NormLatDEG(lat), DistanceUtils.NormLatDEG(lat));
                }
               else
                {
                    shape = Context.MakePoint(DistanceUtils.NormLonDEG(lng), DistanceUtils.NormLatDEG(lat));
                }

                AddSpatialShapes(latLongKey, shape);
            }

            shapeFields = Strategy.CreateIndexableFields(shape);
            //Potentially more than one shape in this field is supported by some
            // strategies; see the javadocs of the SpatialStrategy impl to see.
            foreach (AbstractField f in shapeFields)
            {
                _document.Add(f);
            }
            //add lat long values to index too
            _document.Add(GetField("latitude", NumericUtils.DoubleToPrefixCoded(lat), Field.Index.NOT_ANALYZED,
Field.Store.YES, 0f, false));
            _document.Add(GetField("longitude", NumericUtils.DoubleToPrefixCoded(lng), Field.Index.NOT_ANALYZED,
Field.Store.YES, 0f, false));
        }
        catch (Exception e)
        {
            RollingFileLogger.Instance.LogException(ServiceConstants.SERVICE_INDEXER_CONST,
"Document",string.Format("AddLocation({0},{1})", lat.ToString(), lng.ToString()), e, null);
            throw e;
        }
    }

With 2.9.4, I was able to index about 300,000 rows of data with lat/lng points in about 11
minutes. With this new spatial package it takes upwards of 5 hours (I've killed the test before
it finishes so I don't have an exact timing for it). Here is the spatial context/strategy
I am using:


public static SpatialContext SpatialContext

   {

       get

       {

           if (null == _spatialContext)

           {

               lock (_lockObject)

               {

                   if(null==_spatialContext) _spatialContext = SpatialContext.GEO;

               }

           }

           return _spatialContext;

       }

   }



   public static SpatialStrategy SpatialStrategy

   {

       get

       {

           if (null == _spatialStrategy)

           {

               lock (_lockObject)

               {

                   if (null == _spatialStrategy)

                   {

                       int maxLength = 9;

                       GeohashPrefixTree geohashPrefixTree = new GeohashPrefixTree(SpatialContext,
maxLength);

                       _spatialStrategy = new RecursivePrefixTreeStrategy(geohashPrefixTree,
"geoField");

                   }

               }

           }

           return _spatialStrategy;

       }

   }

Is there something I am doing wrong with my indexing approach? I have cached the shapes that
get created by the lat/lng points since I don't need a new shape for the same coordinates.
It appears to be the CreateIndexableFields() method that is taking the most time during indexing.
I've tried to cache the fields generated by this method to reuse but I can't create a new
instance of the TokenStream from the cached field to use in a new Document (in lucene.net
3.0.3 the constructor for TokenStream is protected). I've lowered the maxLevels int to 4 in
the spatial strategy but I haven't seen an improvement in indexing times. Any feedback would
be greatly appreciated.

________________________________
Anthony Rodriguez
Senior Software Developer

Spark Networks<http://www.spark.net> | Igniting Relationships(r)
8383 Wilshire Blvd. Suite 800 | Beverly Hills, CA 90211
p. 323 658 3000 ext. 8021 | f. 866 945 5209
________________________________

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message