lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelvin Tan <kelvin-li...@relevanz.com>
Subject Re: Document lazy-loading WAS [Re: Fast access to a random page of the search results.]
Date Tue, 08 Mar 2005 17:36:38 GMT
Hey Mark, thanks for the code sample. I did look into this, but for a book's title field, for
example, 

"to be able" != "able to be"
and
"java programmer" != "programmer (java)"  - tokenizer will remove the parentheses

so in my use case at least, a field value isn't simply an array of its terms...

k

On Tue, 8 Mar 2005 16:04:27 +0000 (GMT), mark harwood wrote:
> Your requirement was clear but I guess my suggested
> solution wasn't.
> Here it is in detail:
>
>
> public class CountTest
> {
>
> public static void main(String[] args) throws
> Exception
> {
> RAMDirectory tempDir = new RAMDirectory();
> Analyzer analyzer=new WhitespaceAnalyzer();
>
> IndexWriter writer =        new IndexWriter(tempDir,
> analyzer, true);
> addDoc(writer,"1 1 1", "a");
> addDoc(writer,"2 1 2", "b 2");
> addDoc(writer,"3 3 3", "b 3");
> writer.close();
> class DocCount
> {
> int count=0;
> private String term;
> public DocCount(String term)
> {
> this.term=term;
> }
> public String toString()
> {
> return term+": "+count;
> }
> }
> HashMap aDocTermCounts=new HashMap();
> IndexReader reader = IndexReader.open(tempDir);
> for(int i=0;i<reader.maxDoc();i++)
> {
> TermFreqVector
> tfv=reader.getTermFreqVector(i,"fieldA");
> String[] terms=tfv.getTerms();
> //here we use just the list of terms and ignore
> the frequencies..
> //                    int freqs[]=tfv.getTermFrequencies();
> for (int j = 0; j < terms.length; j++)
> {
> DocCount docCount=(DocCount)
> aDocTermCounts.get(terms[j]);
> if(docCount==null)
> {
> docCount=new DocCount(terms[j]);
>
> aDocTermCounts.put(terms[j],docCount);
> }
> docCount.count++;
> }
> }
> for (Iterator iter =
> aDocTermCounts.values().iterator(); iter.hasNext();)
> {
> DocCount docCount = (DocCount)
> iter.next();
> System.out.println(docCount);
>
> }
> reader.close();
> }
> static void addDoc(IndexWriter writer, String
> fieldA, String fieldB) throws IOException
> {
> Document doc=new Document();
> doc.add(        new Field("fieldA", fieldA,
> Field.Store.YES, Field.Index.TOKENIZED,
> Field.TermVector.YES));
> writer.addDocument(doc);
> }
>
> }
>
>
> Send instant messages to your online friends
> http://uk.messenger.yahoo.com
>
> --------------------------------------------------------------------
> - To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message