lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leighton Hargreaves <Leighton.Hargrea...@Amtech.co.uk>
Subject RE: Can lucene documents have several thousand attributes each?
Date Fri, 30 May 2014 14:23:03 GMT
Aha!  Yes, I think a separate index, with a 'join' is probably the best solution.  It makes
a lot of sense.
One more question:

Is it possible to create a single lucene query which would refer to this separate 'join' index,
and to my main index?  I don't want to have to execute multiple queries and merge the results,
as this would be inefficient for pagination etc.

Thanks for all the insights... 


-----Original Message-----
From: hadfield.marc@gmail.com [mailto:hadfield.marc@gmail.com] On Behalf Of Marc Hadfield
Sent: 23 May 2014 13:42
To: general@lucene.apache.org
Subject: Re: Can lucene documents have several thousand attributes each?

You may be able to leverage Faceting for more complex cases ( http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html),
however it sounds like you could just create a set of Lucene documents with 3 main fields:
object-id-1, distance, object-id-2
and then query this as needed with constraints on the distance.  you would be "joining" this
index to another index (your object index) by object-id.



On Fri, May 23, 2014 at 4:29 AM, Leighton Hargreaves < Leighton.Hargreaves@amtech.co.uk>
wrote:

> Thanks for the responses, I didn't even realise there was a spatial 
> feature.  The distances I need to search for, though, are the minimum 
> distances between arbitrarily complex 3D geometry (the geometry itself 
> wouldn't be represented in lucene, only metadata about it).  So I want 
> to calculate these minimum distances within my own geometry engine, 
> and then pass the calculated distances into lucene/solr.
>
> So really my question is, what is the best way to represent values 
> which relate to 2 documents, so they I can search for documents 'in relation to'
> another document?  (in this case the relation is an 
> externally-calculated distance).
>
>
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: 21 May 2014 22:19
> To: general@lucene.apache.org
> Subject: Re: Can lucene documents have several thousand attributes each?
>
> Also, you can use 2D projections with AND to limit the number of 
> documents you need to compute distances on.
>
>
> On Wed, May 21, 2014 at 10:29 AM, david.w.smiley@gmail.com < 
> david.w.smiley@gmail.com> wrote:
>
> > Hi Leighton,
> >
> > I’m assuming you’re suggesting going about it this way instead of 
> > using the Lucene/Solr spatial feature is because it’s not a 2D 
> > distance?  Solr actually supports n-dimensional Euclidean distance 
> > calculation with this function query (aka Valuesource):
> >
> > dist(2, x,y,z,0,0,0): Euclidean distance between (0,0,0) and (x,y,z) 
> > for each document
> >
> >
> > On Wed, May 21, 2014 at 12:30 PM, Leighton Hargreaves < 
> > Leighton.Hargreaves@amtech.co.uk> wrote:
> >
> > > Hello Lucene project.
> > >
> > > I'm in the process of evaluating lucene for a project where we 
> > > will need to search a large set of 3D objects by various 
> > > attributes.  In many ways, lucene's functionality seems perfect.
> > >
> > > But one thing I'm not sure of: we need to find the set of objects 
> > > that
> > are
> > > within a given distance of any given object.
> > >
> > > One solution would to add a numeric field to each 3D object, for 
> > > each other 3D object, with a name such as
> 'distance_to_<other_object_id_1>'.
> > >  This would allow us to find objects within a given distance of a 
> > > given object with a query like 'distance_to_<object_id>:[ *to 
> > > <max_distance>
> > ]'.
> > >
> > > But this would mean each 3D object would have several thousand
> > attributes,
> > > one for every other 3D object.  Would this be a prohibitively 
> > > expensive
> > way
> > > to do it?
> > >
> > > Another solution would be to handle the spatial aspect within my 
> > > own software ie filter lucene's results according to distance.  
> > > But I worry that this would negatively affect performance by 
> > > causing the set of
> > results
> > > returned to my code to be large, prior to filtering by my own software.
> > >
> > > I apologise if the question is confusing or badly explained, I'm 
> > > just asking in case it turns out to be a standard class of problem 
> > > with good existing solutions.
> > >
> > > Regards,
> > >
> > > Leighton Hargreaves
> > >
> > >
> >
>
Mime
View raw message