lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhao, Xin" <>
Subject Re: controlled vocabulary
Date Fri, 25 Aug 2006 14:21:57 GMT
Thank you for your reply. I had thought about the first two solutions 
before. If we apply one doc for each MeSH term, it would be 26 docs for each 
item digested(we actually need the top 25 MeSH terms generated), would it be 
any problem if there are too many documents? If we apply field name like 
"mesh_1", "mesh_2"..., when it comes to search, we will have to generate a 
loop for each single one of the query terms( there will be more than 20-30 
terms on average, since we are using sematic web to implement concept 
search), do you think it would affect the performance in a very bad way?

----- Original Message ----- 
From: "Dedian Guo" <>
To: <>; "Zhao, Xin" <>
Sent: Thursday, August 24, 2006 4:22 PM
Subject: Re: controlled library

> in my solution, you can apply one doc for each mesh term, or apply 
> different
> keyword such as "mesh_1"...."mesh_10" for your top 10 terms...or u can 
> group
> your mesh terms as one string then add into a field, which requires a 
> simple
> string parser for the group string when you wanna read the terms...
> not sure if that works or answers your question...
> On 8/24/06, Zhao, Xin <> wrote:
>> Hi,
>> I have a design question. Here is what we try to do for indexing:
>> We designed an indexing tool to generate standard MeSH terms from medical
>> citations, and then use Lucene to save the terms and citations for future
>> search. The information we need to save are:
>> a) the exact mesh terms (top 10)
>> b) the score for each term
>> so the codings are like
>> -----------------------------------
>> for the top 10 MeSH Terms
>> myField=Field.Keyword("mesh", mesh.toLowerCase());
>> myField.setBoost(score);
>> doc.add(myFiled);
>> end for
>> ------------------------------------
>> as you could see we generate all the terms under named field "mesh". If I
>> understand correctly, all the fields under the same name would
>> eventually  save into one field, with all the scores be normalized into
>> filed boost. In this case, we wouldn't be able to save separate score, so
>> the information is lost. Am I correct? Is there anyway we could change 
>> it? I
>> understand Lucene is for keyword search, and what we try to do is 
>> Controlled
>> Vocabulary search, Any other tool we could use?
>> Thank you,
>> Xin

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message