lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <>
Subject RE: How to Query for Documents' Anchor Text?
Date Wed, 13 Aug 2008 15:57:30 GMT
Hi dealmaker,

The java-dev mailing list is devoted to discussion of the *development* of Lucene.  In the
future, please use the java-user mailing list for questions about *using* Lucene.

If by "anchor text" you mean HTML <a href="...">anchor text</a>, then you must
make sure that you index this text in its own field - AFAIK, Lucene doesn't have the native
capability to do this - you must write this functionality yourself.

To pull out terms from the "anchor text" field once you have a set of documents you want to
look at (e.g. as a result of a search), use Lucene's Term Vectors feature.

At index time, use the Field constructor that takes in a Field.TermVector specifier:


Once you have created an index with Term Vectors, you will be able to access any documents'
terms along with their frequencies, using IndexReader.getTermFreqVector():



On 08/12/2008 at 8:10 PM, dealmaker wrote:
> Hi,
>   I know that there is already a anchor text feature in lucene that
> index/query all anchor text that leads to a document I want.
> But it is not what I am looking for.  I want to index/query for all
> available anchor text in all document/ a subset of documents, is there
> already some kind of plugin or parser that do this?
> e.g. I write a query: "wep wireless card", it should return
> all the anchor text in all the documents that are related to wep,
> wireless and card.
> Thanks.
> -- 
> View this message in context:
> Sent from the Lucene - Java Developer mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message