lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Kulkarni <kulk...@hawk.iit.edu>
Subject Re: How does Lucene decides which fields have termvectors stored and which not?
Date Tue, 19 Aug 2014 13:39:45 GMT
Hi Kumaran,



The following code is part of the Benchmark utility in Lucene, specifically
SubmissionReport.java


Iterator itr = docMap.entrySet().iterator();
 int totalNumDocuments = reader.numDocs();
ScoreDoc sd[] = td.scoreDocs;
String sep = " \t ";
DocNameExtractor docext = new DocNameExtractor(docNameField);
 for (int i=0; i<sd.length; i++)
{
System.out.println("i = " + i);
  String docName = docext.docName(searcher,sd[i].doc);
  System.out.println("docName : " + docName + "\t map size " +
docMap.size());
// ***** The Map will help us get the docid and
int indexedDocID = docMap.get(docName);
 System.out.println("indexed doc id : " + indexedDocID + "\t docname : " +
docName);
// ******** GET THE tf-idf data now ************ //
Fields fields = reader.getTermVectors(indexedDocID);
System.out.println("fields size : " + fields.size());
 // **** Print log output for testing **** //
Iterator<String> strItr=fields.iterator();
while(strItr.hasNext())
{
String fieldName = strItr.next();
System.out.println("next field " + fieldName);
}
Document DocList= reader.document(indexedDocID);
List<IndexableField> field_list = DocList.getFields();
for(int j=0; j < field_list.size(); j++)
{
System.out.println ( "list field is : " + field_list.get(j).name() );
IndexableFieldType IFT = field_list.get(j).fieldType();
System.out.println(" Field storeTermVectorOffsets : " +
IFT.storeTermVectorOffsets());
//System.out.println(" Field stored :" + IFT.stored());
//for (FieldInfo.IndexOptions c : IFT.indexOptions().values())
// System.out.println(c);
}
// *****************************88 //


On Tue, Aug 19, 2014 at 2:04 AM, Kumaran Ramasubramanian <kums.134@gmail.com
> wrote:

> Hi Sachin Kulkarni,
>
>     If possible, Please share your code.
>
>
> -
> Kumaran R
>
>
>
>
>
> On Tue, Aug 19, 2014 at 9:07 AM, Sachin Kulkarni <kulksac@hawk.iit.edu>
> wrote:
>
> > Hi,
> >
> > I am using Lucene 4.6.0.
> >
> > I have been storing 5 fields for my documents in the index, namely body,
> > title, docname, docdate and docid.
> >
> > But when I get the fields using IndexReader.getTermVectors(indexedDocID)
> I
> > only get
> > the docname and body fields and can retrieve the term vectors for those
> > fields, but not others.
> >
> > I check to see if all the five fields are stored using
> > IndexedFieldType.stored()
> > and all return true. I also check to see that all the fields are indexed
> > and they are, but
> > still when I try to getTermVectors I only receive two fields back.
> >
> > Is there any other config setting that I am missing while indexing that
> is
> > causing this behavior?
> >
> > Thanks to Kumaran and Ian for their answers to my previous questions but
> I
> > have not been able to figure out the above one yet.
> >
> > Thank you very much.
> >
> > Regards,
> > Sachin
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message