lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From henok sahilu <henok_sah...@yahoo.com>
Subject Re: XML results ranking
Date Fri, 16 Jul 2010 08:48:38 GMT
you just have to write a parser that parse each sections of the XML document. 
and these documents will be indexed as a separate informational units . then the 
lucene ranking algorithm can over these separate sections.
i can give the codes doing this thing
henok 
good day to you 



----- Original Message ----
From: Ian Lea <ian.lea@gmail.com>
To: java-user@lucene.apache.org
Sent: Fri, July 16, 2010 1:40:14 AM
Subject: Re: XML results ranking

Hi


If you google "Lucene xml" you'll find info, but I'll attempt to
answer your questions below

> ...
> I wonder whether Lucene:
>
> (1) provides full-text search over content of XML elements ?

Yes.  If you index the content, lucene will let you search over it.

> (2) provides substring search over values of attributes of XML elements ?

Yes, there is wildcard support.  Or use something like n-grams.

> (3) scores relevance of matching XML documents ?

Yes.

> (4) allows to identify (in matching document) XML elements with matched
> query terms and than navigate to parental/children nodes in XML ?structure ?

Not really.

> (5) provides a way to give more weight to some XML element types during
> relevance scoring ?

Yes.  See boosting.


Lucene is a library that doesn't index XML directly, but you can write
code to parse your XML and feed it into lucene, specifying which
fields you want indexed and which stored for later retrieval.


--
Ian.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message