lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Uwe Schindler <...@thetaphi.de>
Subject Re: questions about DocValues in 4.0 alpha
Date Mon, 06 Aug 2012 09:47:00 GMT
You have to call getTopReaderContext on the directory reader and can loop easily over the leaves
using leaves(). All docbases are then relative to the directory reader. If you get the top
reader context from the atomic reader itsself its only relative to itsself, which does not
help.

getSequentialSubReaders might get protected before release anyway.

Uwe



Li Li <fancyerii@gmail.com> schrieb:

>hi everyone,
>    in lucene 4.0 alpha, I found the DocValues are available and gave
>it a try.  I am following the slides in
>http://www.slideshare.net/lucenerevolution/willnauer-simon-doc-values-column-stride-fields-in-lucene
>    I have got 2 questions.
>    1. is DocValues updatable now?
>
>    2. How can I get docBase of an AtomicReader?
>        in Collector, it's easy to get docBase. But I need to get
>docValues after scoring. I find
>AtomicReader.getTopReaderContext().docBaseInParent
>and subReader.getTopReaderContext().docBase. But neither of them is
>correct.
>        So I have to iterate through all subReaders and use maxDoc()
>to find suitable subReader for a docID. any better method to find
>corresponding AtomicReader of a docID?
>		File d=new File("./testIndex");
>		IndexWriterConfig cfg=new IndexWriterConfig(Version.LUCENE_40, new
>WhitespaceAnalyzer(Version.LUCENE_40));
>		cfg.setOpenMode(OpenMode.CREATE);
>		Directory dir=FSDirectory.open(d);
>		IndexWriter writer=new IndexWriter(dir,cfg);
>		FieldType titleFieldType=new FieldType();
>		titleFieldType.setStored(true);
>		titleFieldType.setIndexed(true);
>		titleFieldType.setTokenized(true);
>		titleFieldType.setOmitNorms(true);
>		
>		Document doc=new Document();		
>		Field f=new Field("title","a b c",titleFieldType);
>		doc.add(f);
>		
>		FloatDocValuesField dvf=new FloatDocValuesField("pagerank", 0.8f);
>		doc.add(dvf);
>		
>		writer.addDocument(doc);
>		
>		doc=new Document();
>		doc.add(new Field("title","b d",titleFieldType));
>		dvf=new FloatDocValuesField("pagerank", 0.5f);
>		doc.add(dvf);
>		writer.addDocument(doc);
>		
>		writer.commit();
>		
>		doc=new Document();
>		doc.add(new Field("title","a c",titleFieldType));
>		dvf=new FloatDocValuesField("pagerank", 0.5f);
>		doc.add(dvf);
>		writer.addDocument(doc);
>		
>		
>		DirectoryReader reader=DirectoryReader.open(writer, true);
>		IndexSearcher searcher=new IndexSearcher(reader);
>		Query q=new TermQuery(new Term("title","a"));
>		TopDocs topDocs=searcher.search(q, 10);
>		Set<String> fieldsNeedLoaded=new HashSet<String>(1);
>		fieldsNeedLoaded.add("title");
>		@SuppressWarnings("unchecked")
>		List<AtomicReader> subReaders=(List<AtomicReader>)
>reader.getSequentialSubReaders();
>		Source[] sources=new Source[subReaders.size()];
>		int idx=0;
>		for(AtomicReader subReader:subReaders){
>			sources[idx++]=subReader.docValues("pagerank").getSource();
>		}
>		
>		for(int i=0;i<topDocs.totalHits;i++){
>			int docId=topDocs.scoreDocs[i].doc;
>			float score=topDocs.scoreDocs[i].score;
>			//get title
>			Document document=searcher.document(docId, fieldsNeedLoaded);
>			System.out.println("title: " +document.get("title")+" score:
>"+score);
>			idx=-1;
>			int docBase=0;
>			for(AtomicReader subReader:subReaders){
>				idx++;
>				//int docBase=subReader.getTopReaderContext().docBaseInParent;
>				
>				int realDoc=docId-docBase;
>				if(realDoc>=0&&realDoc<subReader.maxDoc()){
>					double pagerank=sources[idx].getFloat(realDoc);
>					System.out.println(pagerank);
>					break;
>				}
>				docBase+=subReader.maxDoc();
>			}
>		}
>	}
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org

--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message