lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: Sort index by size
Date Mon, 19 Nov 2018 15:34:51 GMT
On 11/19/2018 2:31 AM, Srinivas Kashyap wrote:
> I have a solr core with some 20 fields in it.(all are stored and indexed). For an environment,
the number of documents are around 0.29 million. When I run the full import through DIH, indexing
is completing successfully. But, it is occupying the disk space of around 5 GB. Is there a
possibility where I can go and check, which document is consuming more memory? Put in another
way, can I sort the index based on size?

I am not aware of any way to do that.  Might be one that I don't know 
about, but if there were a way, seems like I would have come across it 
before.

It is not very that the large index size is due to a single document or 
a handful of documents.  It is more likely that most documents are 
relatively large.  I could be wrong about that, though.

If you have 290000 documents (which is how I interpreted 0.29 million) 
and the total index size is about 5 GB, then the average size per 
document in the index is about 18 kilobytes.This is in my view pretty 
large.  Typically I think that most documents are 1-2 kilobytes.

Can we get your Solr version, a copy of your schema, and exactly what 
Solr returns in search results for a typically sized document?  You'll 
need to use a paste website or a file-sharing website ... if you try to 
attach these things to a message, the mailing list will most likely eat 
them, and we'll never see them. If you need to redact the information in 
search results ... please do it in a way that we can still see the exact 
size of the text -- don't just remove information, replace it with 
information that's the same length.

Thanks,
Shawn


Mime
View raw message