From lucene-user-return-9017-apmail-jakarta-lucene-user-archive=jakarta.apache.org@jakarta.apache.org Thu Jul 01 14:35:41 2004 Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 48996 invoked from network); 1 Jul 2004 14:35:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 1 Jul 2004 14:35:41 -0000 Received: (qmail 48734 invoked by uid 500); 1 Jul 2004 14:35:31 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 48574 invoked by uid 500); 1 Jul 2004 14:35:27 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 48500 invoked by uid 99); 1 Jul 2004 14:35:25 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [64.65.202.3] (HELO contextmedia.com) (64.65.202.3) by apache.org (qpsmtpd/0.27.1) with ESMTP; Thu, 01 Jul 2004 07:35:20 -0700 Received: from ppeddi ([192.168.1.44]) by contextmedia.com ; Thu, 01 Jul 2004 10:33:31 -0400 Message-ID: <039101c45f78$9fc8fcc0$aa87a8c0@ppeddi> From: "Praveen Peddi" To: "Lucene Users List" References: <037501c45f75$e679f0a0$aa87a8c0@ppeddi> <1088691861.11662.20.camel@localhost.localdomain> Subject: Re: Sorting and tokenization Date: Thu, 1 Jul 2004 10:35:15 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 8bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 X-Rcpt-To: X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N The solution you suggested is exactly as I expected and I already thought about implementing it. But the problem is the memory in efficiency. Somce times titles are huge. And with i18n, title can be in japanese, chinese or any language which takes mroe memory than english. Ok. how about taking the first token of the title and using it just for the sake of sorting. Does anyone see any problem with it? This solution saves atleast some memory, compared to the other solution. Praveen ----- Original Message ----- From: "John Moylan" To: "Lucene Users List" Sent: Thursday, July 01, 2004 10:24 AM Subject: Re: Sorting and tokenization > Hi, > > You just need to have another title field that is not tokenized - for > sorting purposes. > > Best, > John > > On Thu, 2004-07-01 at 15:15, Praveen Peddi wrote: > > Hello all, > > Now that lucene 1.4 rc3 has sorting functionality built in, I am adding sorting functionality to our searching. Before posting any question to this mailing list, I have been going thru most of the email responses in this mailing list related to sorting. I have found that I cannot tokenize the fields that I want to sort on. > > > > Lets take the example I have. > > I use lucene 1.3 final for searching. Sorting is in fact a very important feature in our application. But we found that lucene does not support out of box, we had to implement sorting by score and doc id programatically which is kind of useless for us. So I thought lucene's new sorting feature will best suit now. But unfortunately, the field called "title" is tokenized currently. And this is done purposefully because users would want to search partial matches (or rather search on multiple words of the title). So if we make it un tokenized we may lose an improtant functionality. > > > > My question is, is there any way I can achieve sorting the objects by title and keeping title as tokenized? > > > > Thanks in advance. > > > > Praveen > > > > > > ************************************************************** > > Praveen Peddi > > Sr Software Engg, Context Media, Inc. > > email:ppeddi@contextmedia.com > > Tel: 401.854.3475 > > Fax: 401.861.3596 > > web: http://www.contextmedia.com > > ************************************************************** > > Context Media- "The Leader in Enterprise Content Integration" > -- > John Moylan > ---------------------- > ePublishing > Radio Telefis Eireann, > Montrose House, > Donnybrook, > Dublin 4, > Eire > t:+353 1 2083564 > e:john.moylan@rte.ie > > > **************************************************************************** ** > The information in this e-mail is confidential and may be legally privileged. > It is intended solely for the addressee. Access to this e-mail by anyone else > is unauthorised. If you are not the intended recipient, any disclosure, > copying, distribution, or any action taken or omitted to be taken in reliance > on it, is prohibited and may be unlawful. > Please note that emails to, from and within RTÉ may be subject to the Freedom > of Information Act 1997 and may be liable to disclosure. > **************************************************************************** ** > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org