Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 81677 invoked from network); 2 Nov 2005 16:04:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Nov 2005 16:04:37 -0000 Received: (qmail 18747 invoked by uid 500); 2 Nov 2005 16:04:30 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 18714 invoked by uid 500); 2 Nov 2005 16:04:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 18703 invoked by uid 99); 2 Nov 2005 16:04:30 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2005 08:04:30 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [128.230.18.5] (HELO mailbox.syr.edu) (128.230.18.5) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2005 08:04:25 -0800 Received: from [128.230.84.109] (istcnlpd9hb3r31.syr.edu [128.230.84.109]) by mailbox.syr.edu (8.12.10/8.12.10) with ESMTP id jA2G45Yv001602 for ; Wed, 2 Nov 2005 11:04:06 -0500 (EST) Message-ID: <4368E375.2070105@syr.edu> Date: Wed, 02 Nov 2005 11:04:05 -0500 From: Grant Ingersoll User-Agent: Mozilla Thunderbird 1.0.6 (Windows/20050716) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Creating document fields by providing termvector directly (bypassing the analyzing/tokenizing stage) References: <200511021310.18686.rj@last.fm> In-Reply-To: <200511021310.18686.rj@last.fm> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Not sure if this is feasible, but is there someway you could use a "fake" analyzer that you constructed using your hashtable/termvector and then have it output the tokens directly from the hashtable via the TokenStream? Maybe you would have to pass in an empty/dummy string to the field constructor. At least you wouldn't have to create the string of your term vectors Just a thought, Grant Richard Jones wrote: >Hi, >I'm using lucene (which rocks, btw ;) behind the scenes at www.last.fm for >various things, and i've run into a situation that seems somewhat inelegant >regarding populating fields which i already know the termvector for. > >I'm creating a document for each user (last.fm tracks music taste for people), >with a field that depicts a users favourite 500 artists. Each artist is >represented by an integer, here's a simple example with 3 artists: > >If i've listened to Radiohead (id 1) 10 times, Coldplay (id 2) 5 times and >Beck (id 3) 2 times, the field would look like this "1 1 1 1 1 1 1 1 1 1 2 2 >2 2 2 3 3" > >I use this index for quickly finding "top fans" of an artist or combination of >artists, comparing peoples music taste and other things on the fly. > >The issue is that i already have the termvecor (radiohead=10, coldplay=5, >beck=2) handy as a hashtable, and i've found myself building up a string of >numbers separated by spaces as shown above, then feeding this into lucene (i >store the termvec of the field in lucene). Is there a way i could pass a >termvector directly to lucene to cut out the ugly "turn it into a string and >let lucene parse it" step? basically i want to provide the termvector for a >field when inserting a new document, rather than let lucene build it by >analyzing a string. > >This does feel like a rather perverted use of lucene i suppose.. It's faster >and less hassle than other methods i've tried to date though. > >Regards, >RJ > > > > > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >For additional commands, e-mail: java-user-help@lucene.apache.org > > > > -- ------------------------------------------------------------------- Grant Ingersoll Sr. Software Engineer Center for Natural Language Processing Syracuse University School of Information Studies 337 Hinds Hall Syracuse, NY 13244 http://www.cnlp.org Voice: 315-443-5484 Fax: 315-443-6886 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org