lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <Bill.Che...@sungard.com>
Subject Error: there are more terms than documents...
Date Thu, 23 Apr 2009 19:25:04 GMT
Hello,

 

I'm getting a strange error when I make a Lucene (2.2.0) query w/ the
following call:

 

java.lang.RuntimeException: there are more terms than documents in field
"objectId", but it's impossible to sort on tokenized fields

      at
org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.ja
va:377)

      at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
)

      at
org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.ja
va:350)

      at
org.apache.lucene.search.FieldCacheImpl$11.createValue(FieldCacheImpl.ja
va:461)

      at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
)

      at
org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:424)

      at
org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedH
itQueue.java:338)

      at
org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHi
tQueue.java:172)

      at
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72
)

      at
org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSo
rtedHitQueue.java:155)

      at
org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.
java:56)

      at
org.apache.lucene.search.TopFieldDocCollector.<init>(TopFieldDocCollecto
r.java:41)

      at
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:122)

      at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:74)

      at org.apache.lucene.search.Hits.<init>(Hits.java:61)

      at org.apache.lucene.search.Searcher.search(Searcher.java:55)

 

The strange thing is that I've read the javadoc for the Sort object
where it says:

The fields used to determine sort order must be carefully chosen.
Documents must contain a single term in such a field, and the value of
the term should indicate the document's relative position in a given
sort order. The field must be indexed, but should not be tokenized, and
does not need to be stored (unless you happen to want it back with the
rest of your document data). In other words: 

document.add (new Field ("byNumber", Integer.toString(x),
Field.Store.NO, Field.Index.UN_TOKENIZED));

Therefore when I create my "objectId" field in my document I use the
call:

 

doc.add(new Field("objectId", s.getObjectId(), Field.Store.NO,
Field.Index.UN_TOKENIZED));

 

Note: s.getObjectId() returns a String.

 

After the index is created and I print out a typical document (using the
Document.toString() method) I get this:

 

Document<stored/uncompressed,indexed

<id:1146513> stored/uncompressed,indexed

<_hibernate_class:com.mycompany.metadb.orm.Series> indexed

<RestrictionLevel:1> indexed,

tokenized<keywords:com.mycompany.metadbsync.index.SeriesTokenStream@134a
b4e> indexed,

tokenized<characteristics:com.
mycompany.metadbsync.index.CharacteristicTokenStream@daa825> indexed

<objectId:DF.SES.AA.derek.Public_01> indexed

<Name:Public 01> indexed

<UserID:derek> indexed

<Data Class:Defined Formulas> indexed

<Location:AA> indexed

<Client:SES> indexed

<DIM1:DF> indexed

<DIM2:SES> indexed

<DIM3:AA> indexed

<DIM4:derek> indexed

<DIM5:Public_01> indexed

<Type:Formula>>

 

So it looks like it got created correctly.

 

For what it's worth the query call looks like this:

 

Hits hits = seriesIndexSearcher.search(query, new Sort("objectId"));

 

The actual query is a Boolean query with lots of TermQuery clauses and
sub clauses.  The term queries are against various of the other fields
shown above, including some of the tokenized fields.  

  

Any help appreciated.

 

regards,

 

Bill Chesky

 

PS. Just as an aside, what does it mean for a field to be stored or not
stored.  Looking at the output above, the 'id' field is stored and the
'objectId' is not.  Yet both of them get displayed by the
Document.toString() method.  So even the objectId field got "stored" at
least in the sense that I understand the term (otherwise how did it get
displayed) so I'm obviously missing something about what "stored" means
in the Lucene context.

 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message