lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Pugh <ep...@opensourceconnections.com>
Subject Should analysis.jsp honor maxFieldLength
Date Tue, 24 Aug 2010 16:03:17 GMT
Hi all,

I have maxFieldLength set to 10000 in solrconfig.xml, but was playing around with really large
document (The King James Bible) in analysis.jsp.   I hacked analysis.jsp to show me the number
of terms at each filter, and the headers, but without turning everything on by checkboxing
verbose.  

My results shown at this screenshot: http://img.skitch.com/20100824-t36rq45i2wfimwyd53gwiqebdy.png
seem to confirm that maxFieldLength is NOT honored by the analysis.jsp.   

But it seems to me that folks using analysis.jsp would expect the process to be exactly like
what happens during a document being indexed??   In my specific case, it took me a while to
realize that the reason my indexing results differed from analysis.jsp results was because
indexing only looked at the first 10000 tokens, but analysis looked at all 101561. A horizontal
table of 10,000 cells kind of looks like a horizontal field of 101,561 cells!

Would it make sense to parse the text through the DocInverterPerField in analysis.jsp?  Or
to maybe just modify the getTokens method in analysis.jsp to only parse maxFieldLength tokens?
 I think I can do it via looking up the SolrCore, and doing core.getSolrConfig().mainIndexConfig.maxFieldLength


Eric





-----------------------------------------------------
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal









---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message