lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: [jira] Commented: (LUCENE-639) [PATCH] Slight performance improvement for readVInt() of IndexInput
Date Wed, 02 Aug 2006 03:07:02 GMT

: Yonik, I think it is possible to make a reasonable assumption about the
: distribution of the vints. Lucene stores deltas between document IDs
: instead of document IDs, and I guess (no data available) most
: frequencies will be below 128.

perhaps the best way to identify what "typical" assumptions can be made,
would be to produce a small selfcontained app that takes in an index
directory on the command line and...

1) generates some stats on term frequencies and distributions
2) runs several benchmarks where it scans the index several times using
the various implimentaitons of readVInt
3) prints out all of the stats it's gathered.

..then put it online, and post a message to java-user asking folks to
please download it, run it against the index on as "calm" of a machine as
they can, and then reply with their results.

If people know that a potentially significant performance gain is
available, and an easy mechanism is provided for them to give you stats
about their index, i'm sure you'd get more then a few responses.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message