lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kumaran Ramasubramanian <kums....@gmail.com>
Subject Re: How does Lucene decides which fields have termvectors stored and which not?
Date Tue, 19 Aug 2014 14:42:51 GMT
Hi Sachin

        i want to look into ur indexing code. please share it

-
Kumaran R





On Tue, Aug 19, 2014 at 7:18 PM, Sachin Kulkarni <kulksac@hawk.iit.edu>
wrote:

> Hi,
>
> Sorry for all the code, It got sent out accidentally.
>
> The following code is part of the Benchmark utility in Lucene, specifically
> SubmissionReport.java
>
>
> // Here reader is the IndexReader.
>
>
>               Iterator itr = docMap.entrySet().iterator();
>  int totalNumDocuments = reader.numDocs();
> ScoreDoc sd[] = td.scoreDocs;
> String sep = " \t ";
> DocNameExtractor docext = new DocNameExtractor(docNameField);
>  for (int i=0; i<sd.length; i++)
> {
>    String docName = docext.docName(searcher,sd[i].doc);
>  // ***** The Map of documents will help us get the docid
> int indexedDocID = docMap.get(docName);
>  Fields fields = reader.getTermVectors(indexedDocID);
>  Iterator<String> strItr=fields.iterator();
>
> /// ********** The following while is printing the fieldNames which only
> show 2 fields out of the 5 that I am looking for.
> while(strItr.hasNext())
> {
> String fieldName = strItr.next();
> System.out.println("next field " + fieldName);
> }
> Document DocList= reader.document(indexedDocID);
> List<IndexableField> field_list = DocList.getFields();
>
>         /// ****** The following for loop prints the five fields and it's
> related information.
> for(int j=0; j < field_list.size(); j++)
> {
> System.out.println ( "list field is : " + field_list.get(j).name() );
> IndexableFieldType IFT = field_list.get(j).fieldType();
> System.out.println(" Field storeTermVectorOffsets : " +
> IFT.storeTermVectorOffsets());
> System.out.println(" Field stored :" + IFT.stored());
>  }
> // ***************************** //
>                   }
>
>
>  /**** THE OUTPUT for this section of code is
> fields size : 2
> next field body
> next field docname
>
> list field is : docid
>  Field storeTermVectorOffsets : false
> list field is : docname
>  Field storeTermVectorOffsets : false
> list field is : docdate
>  Field storeTermVectorOffsets : false
> list field is : doctitle
>  Field storeTermVectorOffsets : false
> list field is : body
>  Field storeTermVectorOffsets : false
>
> *******/
>
> Hope this code comes out legible in the email.
>
> Thank you.
>
> Regards,
> Sachin Kulkarni
>
>
> On Tue, Aug 19, 2014 at 8:39 AM, Sachin Kulkarni <kulksac@hawk.iit.edu>
> wrote:
>
> > Hi Kumaran,
> >
> >
> >
> > The following code is part of the Benchmark utility in Lucene,
> > specifically SubmissionReport.java
> >
> >
> > Iterator itr = docMap.entrySet().iterator();
> >  int totalNumDocuments = reader.numDocs();
> > ScoreDoc sd[] = td.scoreDocs;
> >  String sep = " \t ";
> > DocNameExtractor docext = new DocNameExtractor(docNameField);
> >  for (int i=0; i<sd.length; i++)
> > {
> > System.out.println("i = " + i);
> >   String docName = docext.docName(searcher,sd[i].doc);
> >   System.out.println("docName : " + docName + "\t map size " +
> > docMap.size());
> >  // ***** The Map will help us get the docid and
> > int indexedDocID = docMap.get(docName);
> >  System.out.println("indexed doc id : " + indexedDocID + "\t docname : "
> > + docName);
> >  // ******** GET THE tf-idf data now ************ //
> > Fields fields = reader.getTermVectors(indexedDocID);
> >  System.out.println("fields size : " + fields.size());
> >  // **** Print log output for testing **** //
> >  Iterator<String> strItr=fields.iterator();
> > while(strItr.hasNext())
> > {
> >  String fieldName = strItr.next();
> > System.out.println("next field " + fieldName);
> > }
> >  Document DocList= reader.document(indexedDocID);
> > List<IndexableField> field_list = DocList.getFields();
> >  for(int j=0; j < field_list.size(); j++)
> > {
> > System.out.println ( "list field is : " + field_list.get(j).name() );
> >  IndexableFieldType IFT = field_list.get(j).fieldType();
> > System.out.println(" Field storeTermVectorOffsets : " +
> > IFT.storeTermVectorOffsets());
> >  //System.out.println(" Field stored :" + IFT.stored());
> > //for (FieldInfo.IndexOptions c : IFT.indexOptions().values())
> >  // System.out.println(c);
> > }
> > // *****************************88 //
> >
> >
> > On Tue, Aug 19, 2014 at 2:04 AM, Kumaran Ramasubramanian <
> > kums.134@gmail.com> wrote:
> >
> >> Hi Sachin Kulkarni,
> >>
> >>     If possible, Please share your code.
> >>
> >>
> >> -
> >> Kumaran R
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Aug 19, 2014 at 9:07 AM, Sachin Kulkarni <kulksac@hawk.iit.edu>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > I am using Lucene 4.6.0.
> >> >
> >> > I have been storing 5 fields for my documents in the index, namely
> body,
> >> > title, docname, docdate and docid.
> >> >
> >> > But when I get the fields using
> >> IndexReader.getTermVectors(indexedDocID) I
> >> > only get
> >> > the docname and body fields and can retrieve the term vectors for
> those
> >> > fields, but not others.
> >> >
> >> > I check to see if all the five fields are stored using
> >> > IndexedFieldType.stored()
> >> > and all return true. I also check to see that all the fields are
> indexed
> >> > and they are, but
> >> > still when I try to getTermVectors I only receive two fields back.
> >> >
> >> > Is there any other config setting that I am missing while indexing
> that
> >> is
> >> > causing this behavior?
> >> >
> >> > Thanks to Kumaran and Ian for their answers to my previous questions
> >> but I
> >> > have not been able to figure out the above one yet.
> >> >
> >> > Thank you very much.
> >> >
> >> > Regards,
> >> > Sachin
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message