lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sachin Kulkarni <kulk...@hawk.iit.edu>
Subject Re: How does Lucene decides which fields have termvectors stored and which not?
Date Tue, 19 Aug 2014 13:48:10 GMT
Hi,

Sorry for all the code, It got sent out accidentally.

The following code is part of the Benchmark utility in Lucene, specifically
SubmissionReport.java


// Here reader is the IndexReader.


              Iterator itr = docMap.entrySet().iterator();
 int totalNumDocuments = reader.numDocs();
ScoreDoc sd[] = td.scoreDocs;
String sep = " \t ";
DocNameExtractor docext = new DocNameExtractor(docNameField);
 for (int i=0; i<sd.length; i++)
{
   String docName = docext.docName(searcher,sd[i].doc);
 // ***** The Map of documents will help us get the docid
int indexedDocID = docMap.get(docName);
 Fields fields = reader.getTermVectors(indexedDocID);
 Iterator<String> strItr=fields.iterator();

/// ********** The following while is printing the fieldNames which only
show 2 fields out of the 5 that I am looking for.
while(strItr.hasNext())
{
String fieldName = strItr.next();
System.out.println("next field " + fieldName);
}
Document DocList= reader.document(indexedDocID);
List<IndexableField> field_list = DocList.getFields();

        /// ****** The following for loop prints the five fields and it's
related information.
for(int j=0; j < field_list.size(); j++)
{
System.out.println ( "list field is : " + field_list.get(j).name() );
IndexableFieldType IFT = field_list.get(j).fieldType();
System.out.println(" Field storeTermVectorOffsets : " +
IFT.storeTermVectorOffsets());
System.out.println(" Field stored :" + IFT.stored());
 }
// ***************************** //
                  }


 /**** THE OUTPUT for this section of code is
fields size : 2
next field body
next field docname

list field is : docid
 Field storeTermVectorOffsets : false
list field is : docname
 Field storeTermVectorOffsets : false
list field is : docdate
 Field storeTermVectorOffsets : false
list field is : doctitle
 Field storeTermVectorOffsets : false
list field is : body
 Field storeTermVectorOffsets : false

*******/

Hope this code comes out legible in the email.

Thank you.

Regards,
Sachin Kulkarni


On Tue, Aug 19, 2014 at 8:39 AM, Sachin Kulkarni <kulksac@hawk.iit.edu>
wrote:

> Hi Kumaran,
>
>
>
> The following code is part of the Benchmark utility in Lucene,
> specifically SubmissionReport.java
>
>
> Iterator itr = docMap.entrySet().iterator();
>  int totalNumDocuments = reader.numDocs();
> ScoreDoc sd[] = td.scoreDocs;
>  String sep = " \t ";
> DocNameExtractor docext = new DocNameExtractor(docNameField);
>  for (int i=0; i<sd.length; i++)
> {
> System.out.println("i = " + i);
>   String docName = docext.docName(searcher,sd[i].doc);
>   System.out.println("docName : " + docName + "\t map size " +
> docMap.size());
>  // ***** The Map will help us get the docid and
> int indexedDocID = docMap.get(docName);
>  System.out.println("indexed doc id : " + indexedDocID + "\t docname : "
> + docName);
>  // ******** GET THE tf-idf data now ************ //
> Fields fields = reader.getTermVectors(indexedDocID);
>  System.out.println("fields size : " + fields.size());
>  // **** Print log output for testing **** //
>  Iterator<String> strItr=fields.iterator();
> while(strItr.hasNext())
> {
>  String fieldName = strItr.next();
> System.out.println("next field " + fieldName);
> }
>  Document DocList= reader.document(indexedDocID);
> List<IndexableField> field_list = DocList.getFields();
>  for(int j=0; j < field_list.size(); j++)
> {
> System.out.println ( "list field is : " + field_list.get(j).name() );
>  IndexableFieldType IFT = field_list.get(j).fieldType();
> System.out.println(" Field storeTermVectorOffsets : " +
> IFT.storeTermVectorOffsets());
>  //System.out.println(" Field stored :" + IFT.stored());
> //for (FieldInfo.IndexOptions c : IFT.indexOptions().values())
>  // System.out.println(c);
> }
> // *****************************88 //
>
>
> On Tue, Aug 19, 2014 at 2:04 AM, Kumaran Ramasubramanian <
> kums.134@gmail.com> wrote:
>
>> Hi Sachin Kulkarni,
>>
>>     If possible, Please share your code.
>>
>>
>> -
>> Kumaran R
>>
>>
>>
>>
>>
>> On Tue, Aug 19, 2014 at 9:07 AM, Sachin Kulkarni <kulksac@hawk.iit.edu>
>> wrote:
>>
>> > Hi,
>> >
>> > I am using Lucene 4.6.0.
>> >
>> > I have been storing 5 fields for my documents in the index, namely body,
>> > title, docname, docdate and docid.
>> >
>> > But when I get the fields using
>> IndexReader.getTermVectors(indexedDocID) I
>> > only get
>> > the docname and body fields and can retrieve the term vectors for those
>> > fields, but not others.
>> >
>> > I check to see if all the five fields are stored using
>> > IndexedFieldType.stored()
>> > and all return true. I also check to see that all the fields are indexed
>> > and they are, but
>> > still when I try to getTermVectors I only receive two fields back.
>> >
>> > Is there any other config setting that I am missing while indexing that
>> is
>> > causing this behavior?
>> >
>> > Thanks to Kumaran and Ian for their answers to my previous questions
>> but I
>> > have not been able to figure out the above one yet.
>> >
>> > Thank you very much.
>> >
>> > Regards,
>> > Sachin
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message