Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 41371 invoked from network); 25 Aug 2006 15:20:18 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 25 Aug 2006 15:20:18 -0000 Received: (qmail 6000 invoked by uid 500); 25 Aug 2006 15:20:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 5963 invoked by uid 500); 25 Aug 2006 15:20:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 5952 invoked by uid 99); 25 Aug 2006 15:20:11 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Aug 2006 08:20:11 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [162.129.8.151] (HELO ipex2.johnshopkins.edu) (162.129.8.151) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 25 Aug 2006 08:20:10 -0700 Received: from atis9nn0d91.monument1.jhmi.edu (HELO ATIS9NN0D91) ([162.129.49.150]) by ipex2.johnshopkins.edu with SMTP; 25 Aug 2006 11:20:32 -0400 X-BrightmailFiltered: true X-Brightmail-Tracker: AAAAAA== X-IronPort-AV: i="4.08,169,1154923200"; d="scan'208"; a="189139375:sNHT294910672" Message-ID: <20a601c6c859$fb03c290$963181a2@win.ad.jhu.edu> Reply-To: "Zhao, Xin" From: "Zhao, Xin" To: References: <16ee01c6c790$2f7b36e0$963181a2@win.ad.jhu.edu> <207101c6c851$d24e4990$963181a2@win.ad.jhu.edu> <44EF0E04.50308@masterfile.com> Subject: Re: controlled vocabulary Date: Fri, 25 Aug 2006 11:20:21 -0400 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2869 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2900.2962 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi, Rupinder, My understanding is Field.Index.NO_NORMS disables index-time boosting and field length normalization at the same time. But I do need index-time boosting to store the scoring of each mesh term. Have I missed anything? Thank you very much for your help, Xin ----- Original Message ----- From: "Rupinder Singh Mazara" To: Sent: Friday, August 25, 2006 10:49 AM Subject: Re: controlled vocabulary > hi Xin > > this is take a look at this you can add multiple fields with the name > mesh > for ( i=0; i< meshList.size() ; i++ ){ > meshTerm = meshList.get(i) > document.addField( new Field( "mesh", meshTerm.semanticWebConceptId, > Field.Store.YES , Field.Index.NO_NORMS ); > } > > when querying this index, create a analyzer that infers the text string > and generates id's that correspond to the mesh term in the semantic web > > > > Zhao, Xin wrote: >> Hi, >> Thank you for your reply. I had thought about the first two solutions >> before. If we apply one doc for each MeSH term, it would be 26 docs for >> each item digested(we actually need the top 25 MeSH terms generated), >> would it be any problem if there are too many documents? If we apply >> field name like "mesh_1", "mesh_2"..., when it comes to search, we will >> have to generate a loop for each single one of the query terms( there >> will be more than 20-30 terms on average, since we are using sematic web >> to implement concept search), do you think it would affect the >> performance in a very bad way? >> Regards, >> Xin >> >> >> ----- Original Message ----- From: "Dedian Guo" >> To: ; "Zhao, Xin" >> Sent: Thursday, August 24, 2006 4:22 PM >> Subject: Re: controlled library >> >> >>> in my solution, you can apply one doc for each mesh term, or apply >>> different >>> keyword such as "mesh_1"...."mesh_10" for your top 10 terms...or u can >>> group >>> your mesh terms as one string then add into a field, which requires a >>> simple >>> string parser for the group string when you wanna read the terms... >>> >>> not sure if that works or answers your question... >>> >>> On 8/24/06, Zhao, Xin wrote: >>>> >>>> Hi, >>>> I have a design question. Here is what we try to do for indexing: >>>> We designed an indexing tool to generate standard MeSH terms from >>>> medical >>>> citations, and then use Lucene to save the terms and citations for >>>> future >>>> search. The information we need to save are: >>>> a) the exact mesh terms (top 10) >>>> b) the score for each term >>>> so the codings are like >>>> ----------------------------------- >>>> for the top 10 MeSH Terms >>>> myField=Field.Keyword("mesh", mesh.toLowerCase()); >>>> myField.setBoost(score); >>>> doc.add(myFiled); >>>> end for >>>> ------------------------------------ >>>> as you could see we generate all the terms under named field "mesh". If >>>> I >>>> understand correctly, all the fields under the same name would >>>> eventually save into one field, with all the scores be normalized into >>>> filed boost. In this case, we wouldn't be able to save separate score, >>>> so >>>> the information is lost. Am I correct? Is there anyway we could change >>>> it? I >>>> understand Lucene is for keyword search, and what we try to do is >>>> Controlled >>>> Vocabulary search, Any other tool we could use? >>>> >>>> Thank you, >>>> Xin >>>> >>>> >>>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> >> >> > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org