lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rupinder Singh Mazara <rmaz...@masterfile.com>
Subject Re: controlled vocabulary
Date Fri, 25 Aug 2006 14:49:40 GMT
hi Xin

  this is take a look at this you can add multiple fields with the name 
mesh
for ( i=0; i< meshList.size() ; i++ ){
    meshTerm = meshList.get(i)
  document.addField( new Field( "mesh", meshTerm.semanticWebConceptId, 
Field.Store.YES , Field.Index.NO_NORMS  );
}

  when querying this index, create a analyzer that infers the text 
string and generates id's that correspond to the mesh term in the 
semantic web

 
 
Zhao, Xin wrote:
> Hi,
> Thank you for your reply. I had thought about the first two solutions 
> before. If we apply one doc for each MeSH term, it would be 26 docs 
> for each item digested(we actually need the top 25 MeSH terms 
> generated), would it be any problem if there are too many documents? 
> If we apply field name like "mesh_1", "mesh_2"..., when it comes to 
> search, we will have to generate a loop for each single one of the 
> query terms( there will be more than 20-30 terms on average, since we 
> are using sematic web to implement concept search), do you think it 
> would affect the performance in a very bad way?
> Regards,
> Xin
>
>
> ----- Original Message ----- From: "Dedian Guo" <gdedian@gmail.com>
> To: <java-user@lucene.apache.org>; "Zhao, Xin" <xzhao@jhu.edu>
> Sent: Thursday, August 24, 2006 4:22 PM
> Subject: Re: controlled library
>
>
>> in my solution, you can apply one doc for each mesh term, or apply 
>> different
>> keyword such as "mesh_1"...."mesh_10" for your top 10 terms...or u 
>> can group
>> your mesh terms as one string then add into a field, which requires a 
>> simple
>> string parser for the group string when you wanna read the terms...
>>
>> not sure if that works or answers your question...
>>
>> On 8/24/06, Zhao, Xin <xzhao9@jhmi.edu> wrote:
>>>
>>> Hi,
>>> I have a design question. Here is what we try to do for indexing:
>>> We designed an indexing tool to generate standard MeSH terms from 
>>> medical
>>> citations, and then use Lucene to save the terms and citations for 
>>> future
>>> search. The information we need to save are:
>>> a) the exact mesh terms (top 10)
>>> b) the score for each term
>>> so the codings are like
>>> -----------------------------------
>>> for the top 10 MeSH Terms
>>> myField=Field.Keyword("mesh", mesh.toLowerCase());
>>> myField.setBoost(score);
>>> doc.add(myFiled);
>>> end for
>>> ------------------------------------
>>> as you could see we generate all the terms under named field "mesh". 
>>> If I
>>> understand correctly, all the fields under the same name would
>>> eventually  save into one field, with all the scores be normalized into
>>> filed boost. In this case, we wouldn't be able to save separate 
>>> score, so
>>> the information is lost. Am I correct? Is there anyway we could 
>>> change it? I
>>> understand Lucene is for keyword search, and what we try to do is 
>>> Controlled
>>> Vocabulary search, Any other tool we could use?
>>>
>>> Thank you,
>>> Xin
>>>
>>>
>>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message