lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (@MITRE.org)" <DSMI...@mitre.org>
Subject Re: custom field type plugin
Date Tue, 23 Jul 2013 17:47:31 GMT
Oh cool!  I'm glad it at least seemed to work.  Can you post your
configuration of the field type and report from Solr's logs what the
"maxLevels" is used for this field, which is logged the first time you use
the field type?

Maybe there isn't a limit under 10B after all.  Some quick'n'dirty
calculations I just did indicate there shouldn't be a problem but real-world
usage will be a better proof.  Indexing probably won't be terribly slow,
queries could get pretty slow if the amount of indexed data is really high. 
I'd love to hear how it works out for you.  Your use-case would benefit a
lot from an improved prefix tree implementation.

I don't gather how a 3rd dimension would play into this.  Support for
multi-dimensional spatial is on the drawing board.

~ David


Kevin Stone wrote
> What are the dangers of trying to use a range of 10 billion? Simply a
> slower index time? Or will I get inaccurate results?
> I have tried it on a very small sample of documents, and it seemed to
> work. I could spend some time this week trying to get a more robust (and
> accurate) dataset loaded to play around with. The reason for the 10
> billion is to support being able to query for a region on a chromosome.
> 
> A user might want to know what genes overlap a point on a specific
> chromosome. Unless I can use 3 dimensional coordinates (which gave an
> error when I tried it), I'll need to multiply the coordinates by some
> offset for each chromosome to be able to normalise the data (at both index
> and query time). The largest chromosome (chr 1) has almost 250,000,000
> base pairs. I could probably squeeze the rest a bit smaller, but I'd
> rather use one size for all chromosomes, since we have more than just
> human data to deal with. It would get quite messy otherwise.
> 
> 
> On 7/22/13 11:50 AM, "David Smiley (@MITRE.org)" &lt;

> DSMILEY@

> &gt; wrote:
> 
>>Like Hoss said, you're going to have to solve this using
>>http://wiki.apache.org/solr/SpatialForTimeDurations
>>Using PointType is *not* going to work because your durations are
>>multi-valued per document.
>>
>>It would be useful to create a custom field type that wraps the capability
>>outlined on the wiki to make it easier to use without requiring the user
>>to
>>think spatially.
>>
>>You mentioned that these numeric ranges extend upwards of 10 billion or
>>so.
>>Unfortunately, the current "prefix tree" implementation under the hood for
>>non-geodetic spatial, the QuadTree, is unlikely to scale to numbers that
>>big.  I don't know where the boundary is, but I doubt 10B.  You could try
>>and see what happens.  I'm working (very slowly on very little spare time)
>>on improving the PrefixTree implementations to scale to such large
>>numbers;
>>I hope something will be available this fall.
>>
>>~ David Smiley
>>
>>
>>Kevin Stone wrote
>>> I have a particular use case that I think might require a custom field
>>> type, however I am having trouble getting the plugin to work.
>>> My use case has to do with genetics data, and we are running into
>>>several
>>> situations were we need to be able to query multiple regions of a
>>> chromosome (or gene, or other object types). All that really boils down
>>>to
>>> is being able to give a number, e.g. 10234, and return documents that
>>>have
>>> regions containing the number. So you'd have a document with a list like
>>> ["10000:16090","400:8000","40123:43564"], and it should come back
>>>because
>>> 10234 falls between "10000:16090". If there is a better or easier way to
>>> do this please speak up. I'd rather not have to use a "join" on another
>>> index, because 1) it's more complex to set up, and 2) we might need to
>>> join against something else and you can only do one join at a time.
>>>
>>> AnywayŠ I tried creating a field type similar to a PointType just to see
>>> if I could get one working. I added the following jars to get it to
>>> compile:
>>>
>>>apache-solr-core-4.0.0,lucene-core-4.0.0,lucene-queries-4.0.0,apache-solr
>>>-solrj-4.0.0.
>>> I am running solr 4.0.0 on jetty, and put my jar file in a sharedLib
>>> folder, and specified it in my solr.xml (I have multiple cores).
>>>
>>> After starting up solr, I got the line that it picked up the jar:
>>> INFO: Adding 'file:/blah/blah/lib/CustomPlugins.jar' to classloader
>>>
>>> But I get this error about it not being able to find the
>>> AbstractSubTypeFieldType class.
>>> Here is the first bit of the trace:
>>>
>>> SEVERE: null:java.lang.NoClassDefFoundError:
>>> org/apache/solr/schema/AbstractSubTypeFieldType
>>> at java.lang.ClassLoader.defineClass1(Native Method)
>>> at java.lang.ClassLoader.defineClass(ClassLoader.java:791)
>>> at
>>>java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>> at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>>> at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>> ...etcŠ
>>>
>>>
>>> Any hints as to what I did wrong? I can provide source code, or a fuller
>>> stack trace, config settings, etc.
>>>
>>> Also, I did try to unpack the solr.war, stick my jar in WEB-INF/lib,
>>>then
>>> repack. However, when I did that, I get a NoClassDefFoundError for my
>>> plugin itself.
>>>
>>>
>>> Thanks,
>>> Kevin
>>>
>>> The information in this email, including attachments, may be
>>>confidential
>>> and is intended solely for the addressee(s). If you believe you received
>>> this email by mistake, please notify the sender by return email as soon
>>>as
>>> possible.
>>
>>
>>
>>
>>
>>-----
>> Author:
>>http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
>>--
>>View this message in context:
>>http://lucene.472066.n3.nabble.com/custom-field-type-plugin-tp4079086p4079
>>494.html
>>Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
> The information in this email, including attachments, may be confidential
> and is intended solely for the addressee(s). If you believe you received
> this email by mistake, please notify the sender by return email as soon as
> possible.





-----
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: http://lucene.472066.n3.nabble.com/custom-field-type-plugin-tp4079086p4079822.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message