lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Kulkarni <kulk...@hawk.iit.edu>
Subject Re: How does Lucene decides which fields have termvectors stored and which not?
Date Tue, 19 Aug 2014 18:59:15 GMT
Hi Kumaran,

I am using the benchmark utility from Lucene and doing the indexing via an
.alg file.
Would you like to see the alg file instead?

Thank you.

Regards,
Sachin


On Tue, Aug 19, 2014 at 9:42 AM, Kumaran Ramasubramanian <kums.134@gmail.com
> wrote:

> Hi Sachin
>
>         i want to look into ur indexing code. please share it
>
> -
> Kumaran R
>
>
>
>
>
> On Tue, Aug 19, 2014 at 7:18 PM, Sachin Kulkarni <kulksac@hawk.iit.edu>
> wrote:
>
> > Hi,
> >
> > Sorry for all the code, It got sent out accidentally.
> >
> > The following code is part of the Benchmark utility in Lucene,
> specifically
> > SubmissionReport.java
> >
> >
> > // Here reader is the IndexReader.
> >
> >
> >               Iterator itr = docMap.entrySet().iterator();
> >  int totalNumDocuments = reader.numDocs();
> > ScoreDoc sd[] = td.scoreDocs;
> > String sep = " \t ";
> > DocNameExtractor docext = new DocNameExtractor(docNameField);
> >  for (int i=0; i<sd.length; i++)
> > {
> >    String docName = docext.docName(searcher,sd[i].doc);
> >  // ***** The Map of documents will help us get the docid
> > int indexedDocID = docMap.get(docName);
> >  Fields fields = reader.getTermVectors(indexedDocID);
> >  Iterator<String> strItr=fields.iterator();
> >
> > /// ********** The following while is printing the fieldNames which only
> > show 2 fields out of the 5 that I am looking for.
> > while(strItr.hasNext())
> > {
> > String fieldName = strItr.next();
> > System.out.println("next field " + fieldName);
> > }
> > Document DocList= reader.document(indexedDocID);
> > List<IndexableField> field_list = DocList.getFields();
> >
> >         /// ****** The following for loop prints the five fields and it's
> > related information.
> > for(int j=0; j < field_list.size(); j++)
> > {
> > System.out.println ( "list field is : " + field_list.get(j).name() );
> > IndexableFieldType IFT = field_list.get(j).fieldType();
> > System.out.println(" Field storeTermVectorOffsets : " +
> > IFT.storeTermVectorOffsets());
> > System.out.println(" Field stored :" + IFT.stored());
> >  }
> > // ***************************** //
> >                   }
> >
> >
> >  /**** THE OUTPUT for this section of code is
> > fields size : 2
> > next field body
> > next field docname
> >
> > list field is : docid
> >  Field storeTermVectorOffsets : false
> > list field is : docname
> >  Field storeTermVectorOffsets : false
> > list field is : docdate
> >  Field storeTermVectorOffsets : false
> > list field is : doctitle
> >  Field storeTermVectorOffsets : false
> > list field is : body
> >  Field storeTermVectorOffsets : false
> >
> > *******/
> >
> > Hope this code comes out legible in the email.
> >
> > Thank you.
> >
> > Regards,
> > Sachin Kulkarni
> >
> >
> > On Tue, Aug 19, 2014 at 8:39 AM, Sachin Kulkarni <kulksac@hawk.iit.edu>
> > wrote:
> >
> > > Hi Kumaran,
> > >
> > >
> > >
> > > The following code is part of the Benchmark utility in Lucene,
> > > specifically SubmissionReport.java
> > >
> > >
> > > Iterator itr = docMap.entrySet().iterator();
> > >  int totalNumDocuments = reader.numDocs();
> > > ScoreDoc sd[] = td.scoreDocs;
> > >  String sep = " \t ";
> > > DocNameExtractor docext = new DocNameExtractor(docNameField);
> > >  for (int i=0; i<sd.length; i++)
> > > {
> > > System.out.println("i = " + i);
> > >   String docName = docext.docName(searcher,sd[i].doc);
> > >   System.out.println("docName : " + docName + "\t map size " +
> > > docMap.size());
> > >  // ***** The Map will help us get the docid and
> > > int indexedDocID = docMap.get(docName);
> > >  System.out.println("indexed doc id : " + indexedDocID + "\t docname :
> "
> > > + docName);
> > >  // ******** GET THE tf-idf data now ************ //
> > > Fields fields = reader.getTermVectors(indexedDocID);
> > >  System.out.println("fields size : " + fields.size());
> > >  // **** Print log output for testing **** //
> > >  Iterator<String> strItr=fields.iterator();
> > > while(strItr.hasNext())
> > > {
> > >  String fieldName = strItr.next();
> > > System.out.println("next field " + fieldName);
> > > }
> > >  Document DocList= reader.document(indexedDocID);
> > > List<IndexableField> field_list = DocList.getFields();
> > >  for(int j=0; j < field_list.size(); j++)
> > > {
> > > System.out.println ( "list field is : " + field_list.get(j).name() );
> > >  IndexableFieldType IFT = field_list.get(j).fieldType();
> > > System.out.println(" Field storeTermVectorOffsets : " +
> > > IFT.storeTermVectorOffsets());
> > >  //System.out.println(" Field stored :" + IFT.stored());
> > > //for (FieldInfo.IndexOptions c : IFT.indexOptions().values())
> > >  // System.out.println(c);
> > > }
> > > // *****************************88 //
> > >
> > >
> > > On Tue, Aug 19, 2014 at 2:04 AM, Kumaran Ramasubramanian <
> > > kums.134@gmail.com> wrote:
> > >
> > >> Hi Sachin Kulkarni,
> > >>
> > >>     If possible, Please share your code.
> > >>
> > >>
> > >> -
> > >> Kumaran R
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> On Tue, Aug 19, 2014 at 9:07 AM, Sachin Kulkarni <
> kulksac@hawk.iit.edu>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > I am using Lucene 4.6.0.
> > >> >
> > >> > I have been storing 5 fields for my documents in the index, namely
> > body,
> > >> > title, docname, docdate and docid.
> > >> >
> > >> > But when I get the fields using
> > >> IndexReader.getTermVectors(indexedDocID) I
> > >> > only get
> > >> > the docname and body fields and can retrieve the term vectors for
> > those
> > >> > fields, but not others.
> > >> >
> > >> > I check to see if all the five fields are stored using
> > >> > IndexedFieldType.stored()
> > >> > and all return true. I also check to see that all the fields are
> > indexed
> > >> > and they are, but
> > >> > still when I try to getTermVectors I only receive two fields back.
> > >> >
> > >> > Is there any other config setting that I am missing while indexing
> > that
> > >> is
> > >> > causing this behavior?
> > >> >
> > >> > Thanks to Kumaran and Ian for their answers to my previous questions
> > >> but I
> > >> > have not been able to figure out the above one yet.
> > >> >
> > >> > Thank you very much.
> > >> >
> > >> > Regards,
> > >> > Sachin
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message