lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhao, Xin" <xzh...@jhmi.edu>
Subject Re: controlled vocabulary
Date Fri, 25 Aug 2006 15:20:21 GMT
Hi, Rupinder,
My understanding is Field.Index.NO_NORMS disables  index-time boosting and 
field length normalization at the same time. But I do need index-time 
boosting to store the scoring of each mesh term. Have I missed anything?
Thank you very much for your help,
Xin

----- Original Message ----- 
From: "Rupinder Singh Mazara" <rmazara@masterfile.com>
To: <java-user@lucene.apache.org>
Sent: Friday, August 25, 2006 10:49 AM
Subject: Re: controlled vocabulary


> hi Xin
>
>  this is take a look at this you can add multiple fields with the name 
> mesh
> for ( i=0; i< meshList.size() ; i++ ){
>    meshTerm = meshList.get(i)
>  document.addField( new Field( "mesh", meshTerm.semanticWebConceptId, 
> Field.Store.YES , Field.Index.NO_NORMS  );
> }
>
>  when querying this index, create a analyzer that infers the text string 
> and generates id's that correspond to the mesh term in the semantic web
>
>
>
> Zhao, Xin wrote:
>> Hi,
>> Thank you for your reply. I had thought about the first two solutions 
>> before. If we apply one doc for each MeSH term, it would be 26 docs for 
>> each item digested(we actually need the top 25 MeSH terms generated), 
>> would it be any problem if there are too many documents? If we apply 
>> field name like "mesh_1", "mesh_2"..., when it comes to search, we will 
>> have to generate a loop for each single one of the query terms( there 
>> will be more than 20-30 terms on average, since we are using sematic web 
>> to implement concept search), do you think it would affect the 
>> performance in a very bad way?
>> Regards,
>> Xin
>>
>>
>> ----- Original Message ----- From: "Dedian Guo" <gdedian@gmail.com>
>> To: <java-user@lucene.apache.org>; "Zhao, Xin" <xzhao@jhu.edu>
>> Sent: Thursday, August 24, 2006 4:22 PM
>> Subject: Re: controlled library
>>
>>
>>> in my solution, you can apply one doc for each mesh term, or apply 
>>> different
>>> keyword such as "mesh_1"...."mesh_10" for your top 10 terms...or u can 
>>> group
>>> your mesh terms as one string then add into a field, which requires a 
>>> simple
>>> string parser for the group string when you wanna read the terms...
>>>
>>> not sure if that works or answers your question...
>>>
>>> On 8/24/06, Zhao, Xin <xzhao9@jhmi.edu> wrote:
>>>>
>>>> Hi,
>>>> I have a design question. Here is what we try to do for indexing:
>>>> We designed an indexing tool to generate standard MeSH terms from 
>>>> medical
>>>> citations, and then use Lucene to save the terms and citations for 
>>>> future
>>>> search. The information we need to save are:
>>>> a) the exact mesh terms (top 10)
>>>> b) the score for each term
>>>> so the codings are like
>>>> -----------------------------------
>>>> for the top 10 MeSH Terms
>>>> myField=Field.Keyword("mesh", mesh.toLowerCase());
>>>> myField.setBoost(score);
>>>> doc.add(myFiled);
>>>> end for
>>>> ------------------------------------
>>>> as you could see we generate all the terms under named field "mesh". If 
>>>> I
>>>> understand correctly, all the fields under the same name would
>>>> eventually  save into one field, with all the scores be normalized into
>>>> filed boost. In this case, we wouldn't be able to save separate score, 
>>>> so
>>>> the information is lost. Am I correct? Is there anyway we could change 
>>>> it? I
>>>> understand Lucene is for keyword search, and what we try to do is 
>>>> Controlled
>>>> Vocabulary search, Any other tool we could use?
>>>>
>>>> Thank you,
>>>> Xin
>>>>
>>>>
>>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message