lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Felipe Hummel <felipehum...@gmail.com>
Subject Re: using lucene to find neighbouring points in an n-dimensional space
Date Thu, 27 Oct 2011 19:42:37 GMT
For the indexing part, you can 'insert' the term multiple times (term-weight
times) constructing the document String manually. That is not very typical,
you would normally feed Lucene with the original documents for it to parse
and index.
The query processing could be done similar as you said.

Just be assured that you really want to use Lucene for this. If you already
have the term-vectors maybe you could just implement the closest
neighbours calculation
by yourself. Just compare your target document with every other in the
dataset and rank by similarity.


Felipe Hummel


On Sun, Oct 23, 2011 at 9:33 PM, prasenjit mukherjee
<prasen.bea@gmail.com>wrote:

> Any pointers/suggestions on my approach ?
>
>
> On 10/22/11, prasenjit mukherjee <prasen.bea@gmail.com> wrote:
> > My use case is the following :
> > Given an n-dimensional vector ( only +ve quadrants/points ) find its
> > closest neighbours. I would like to try out with lucene's default
> > ranking. Here is how a typical document will look like :
> > <term-id:term-weight> ( or <dimension-id:dimension:weight> same thing
> > )
> >
> > doc1 = 1245:15 3490:20 8856:20 etc.
> >
> > As reflected in the above example the number of dimensions is high ( ~
> > 50K ) and the length of vectors are small ( < 40 ).
> >
> > I am thinking of constructing a  BooleanQuery in the following way (
> > for doc1 as Query ) :
> >
> > BooleanQuery bq = new BooleanQuery()
> > bq.add (new TermQuery(new Term("field", "1245") ),
> > BooleanClause.Occur.SHOULD ) ;
> > bq.add (new TermQuery(new Term("field", "3490") ),
> > BooleanClause.Occur.SHOULD ) ;
> > bq.add (new TermQuery(new Term("field", "8856") ),
> > BooleanClause.Occur.SHOULD ) ;
> >
> > The problem is how do I pass the dimension-value ( 15, 20, 20 etc. )
> > in the TermQuery.
> >
> > One solution is to pass as many TermQueries as the diemension value,
> > but was thinking if there is any better way to pass the
> > dimension-weight. I can probably do the same during indexing as
> > latency is not an issue during indexing time.
> >
> > Any help is greatly appreciated.
> >
> > -Thanks,
> > Prasenjit
> >
>
> --
> Sent from my mobile device
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message