lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Memory consumption in trunk for sorting and faceting
Date Sun, 12 Jun 2011 13:39:22 GMT
OK, I know lots of great work has been done to reduce the memory
footprint for sorting and faceting, but what I'm seeing is drastic
enough that I want to see if I'm missing something and to ask what
finer-grained tools people are using to answer the question "How much
more memory efficient is the new way of doing things"?

Setup:

I'm indexing 1.9M Wikipedia articles. Firing up a fresh Solr and
firing a relatively insane query at it while monitoring in jConsole.
Doing a GC from jConsole and looking at the memory used by Solr.
Crude, but I'm trying to get a flavor of what's going on here.


Field         Unique values  type
id               1,917,727        string
user_sort       62,123         string
text                57,759         text      (1.4.1 flavor for all
three Solr versions)
user_id          62,122         int

http://localhost:8983/solr/select/?q=*:*&version=2.2&start=0&rows=10&indent=on&sort=user_sort
asc, id desc&facet=on&facet.field=text&facet.field=user_id&facet.field=id

Yeah, yeah, yeah, faceting and sorting by a unique ID is silly. But it
*does* stress memory.

Anyway, here are the numbers I'm seeing:

1.4.1  328 M
3.2     328 M
trunk    90 M

And it's even more impressive than that when you consider that 20M or
so is just to get in the door.....

Is it fair to say that the two big innovations that have reduced the
memory footprint are:
1> going to byte arrays for string storage
2> the FST work?

Final question. It looks like the FST work is back-ported to the
current 3_x code branch, is that true? Anything else back-ported
there? I'll check that branch out and give it a whirl for kicks.

Thanks,
Erick

A novice programmer gets a program to compile and says "I'm sure it'll
run fine now"
A veteran programmer runs a program for the first time, gets the
expected results and says "I must have done something wrong, that
can't *really* be working".

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message