Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5C3F190DF for ; Thu, 27 Oct 2011 19:43:28 +0000 (UTC) Received: (qmail 43587 invoked by uid 500); 27 Oct 2011 19:43:26 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 43547 invoked by uid 500); 27 Oct 2011 19:43:26 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 43539 invoked by uid 99); 27 Oct 2011 19:43:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Oct 2011 19:43:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of felipehummel@gmail.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vx0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Oct 2011 19:43:17 +0000 Received: by vcdn13 with SMTP id n13so4297289vcd.35 for ; Thu, 27 Oct 2011 12:42:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=wZDvy6eKkA5ePdpTZOIxXWRbZvWdC0nC6eVSuNQnfO8=; b=tQDSWQqFK3waYO7eUOWsgnc3Cy9v7MlrkpeqFjuNYipgOdcOwWJwG6m8PHZPnS2n4P MVsdnBny3z5PuaNEkyzLPwpxkxJombSDUFSfeEvjhuQXqn0Uha/n6stfzVV0LiGgxvwp h61JVObqD57xgH9x9S9MnNtF+dqVOXs5XnRB4= Received: by 10.220.213.132 with SMTP id gw4mr1285602vcb.52.1319744577156; Thu, 27 Oct 2011 12:42:57 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.215.1 with HTTP; Thu, 27 Oct 2011 12:42:37 -0700 (PDT) In-Reply-To: References: From: Felipe Hummel Date: Thu, 27 Oct 2011 15:42:37 -0400 Message-ID: Subject: Re: using lucene to find neighbouring points in an n-dimensional space To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=bcaec54ee53ab734c804b04cfafe X-Virus-Checked: Checked by ClamAV on apache.org --bcaec54ee53ab734c804b04cfafe Content-Type: text/plain; charset=UTF-8 For the indexing part, you can 'insert' the term multiple times (term-weight times) constructing the document String manually. That is not very typical, you would normally feed Lucene with the original documents for it to parse and index. The query processing could be done similar as you said. Just be assured that you really want to use Lucene for this. If you already have the term-vectors maybe you could just implement the closest neighbours calculation by yourself. Just compare your target document with every other in the dataset and rank by similarity. Felipe Hummel On Sun, Oct 23, 2011 at 9:33 PM, prasenjit mukherjee wrote: > Any pointers/suggestions on my approach ? > > > On 10/22/11, prasenjit mukherjee wrote: > > My use case is the following : > > Given an n-dimensional vector ( only +ve quadrants/points ) find its > > closest neighbours. I would like to try out with lucene's default > > ranking. Here is how a typical document will look like : > > ( or same thing > > ) > > > > doc1 = 1245:15 3490:20 8856:20 etc. > > > > As reflected in the above example the number of dimensions is high ( ~ > > 50K ) and the length of vectors are small ( < 40 ). > > > > I am thinking of constructing a BooleanQuery in the following way ( > > for doc1 as Query ) : > > > > BooleanQuery bq = new BooleanQuery() > > bq.add (new TermQuery(new Term("field", "1245") ), > > BooleanClause.Occur.SHOULD ) ; > > bq.add (new TermQuery(new Term("field", "3490") ), > > BooleanClause.Occur.SHOULD ) ; > > bq.add (new TermQuery(new Term("field", "8856") ), > > BooleanClause.Occur.SHOULD ) ; > > > > The problem is how do I pass the dimension-value ( 15, 20, 20 etc. ) > > in the TermQuery. > > > > One solution is to pass as many TermQueries as the diemension value, > > but was thinking if there is any better way to pass the > > dimension-weight. I can probably do the same during indexing as > > latency is not an issue during indexing time. > > > > Any help is greatly appreciated. > > > > -Thanks, > > Prasenjit > > > > -- > Sent from my mobile device > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --bcaec54ee53ab734c804b04cfafe--