lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From prasenjit mukherjee <prasen....@gmail.com>
Subject Re: using lucene to find neighbouring points in an n-dimensional space
Date Fri, 28 Oct 2011 03:43:31 GMT
Thanks for responding.

On Fri, Oct 28, 2011 at 1:12 AM, Felipe Hummel <felipehummel@gmail.com> wrote:
> For the indexing part, you can 'insert' the term multiple times (term-weight
> times) constructing the document String manually. That is not very typical,
> you would normally feed Lucene with the original documents for it to parse
> and index.
> The query processing could be done similar as you said.
>
> Just be assured that you really want to use Lucene for this. If you already
> have the term-vectors maybe you could just implement the closest
> neighbours calculation
> by yourself. Just compare your target document with every other in the
> dataset and rank by similarity.

Main incentive for me to use Lucene/Solr is that it is already being
done by Lucene/Solr in a much scalable way.
I am assuming there is not much overhead with this approach.

-Thanks,
Prasenjit

>
>
> Felipe Hummel
>
>
> On Sun, Oct 23, 2011 at 9:33 PM, prasenjit mukherjee
> <prasen.bea@gmail.com>wrote:
>
>> Any pointers/suggestions on my approach ?
>>
>>
>> On 10/22/11, prasenjit mukherjee <prasen.bea@gmail.com> wrote:
>> > My use case is the following :
>> > Given an n-dimensional vector ( only +ve quadrants/points ) find its
>> > closest neighbours. I would like to try out with lucene's default
>> > ranking. Here is how a typical document will look like :
>> > <term-id:term-weight> ( or <dimension-id:dimension:weight> same
thing
>> > )
>> >
>> > doc1 = 1245:15 3490:20 8856:20 etc.
>> >
>> > As reflected in the above example the number of dimensions is high ( ~
>> > 50K ) and the length of vectors are small ( < 40 ).
>> >
>> > I am thinking of constructing a  BooleanQuery in the following way (
>> > for doc1 as Query ) :
>> >
>> > BooleanQuery bq = new BooleanQuery()
>> > bq.add (new TermQuery(new Term("field", "1245") ),
>> > BooleanClause.Occur.SHOULD ) ;
>> > bq.add (new TermQuery(new Term("field", "3490") ),
>> > BooleanClause.Occur.SHOULD ) ;
>> > bq.add (new TermQuery(new Term("field", "8856") ),
>> > BooleanClause.Occur.SHOULD ) ;
>> >
>> > The problem is how do I pass the dimension-value ( 15, 20, 20 etc. )
>> > in the TermQuery.
>> >
>> > One solution is to pass as many TermQueries as the diemension value,
>> > but was thinking if there is any better way to pass the
>> > dimension-weight. I can probably do the same during indexing as
>> > latency is not an issue during indexing time.
>> >
>> > Any help is greatly appreciated.
>> >
>> > -Thanks,
>> > Prasenjit
>> >
>>
>> --
>> Sent from my mobile device
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message