lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1990) Add unsigned packed int impls in oal.util
Date Sat, 27 Feb 2010 11:14:05 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Michael McCandless updated LUCENE-1990:
---------------------------------------

    Attachment: perf-mkm-20100227.txt

{quote}
bq. Airplane blocking snow drifts!? Where on earth are you anyway?

In Denmark. The guy responsible for clearing the runway did indeed clear the runway. He just
forgot that the plane needs to taxi into the runway in the first place. That made us miss
our connecting flight.
{quote}

Good grief!

{quote}
bq. It's very interesting that align is never a win - I think in that case removing it makes
sense? It'll be a nice simplification.

Well, practically never wins for the machines I tested on and never wins with my implementation.
{quote}

I think we should remove it...

{quote}
bq. Did we ever test performance of the specialized (generated) decoders using switch statements?

I just did a quick hack in order to measure performance and I was very surprised that the
generated switch-based implementations performs so well. It's nearly on par with packed most
of the time and exceeds it in some cases. I only tested on 3 machines though. The hack is
in the LUCENE-1990-te20100226c.patch and is called when the performance test is executed.
{quote}

Thanks for testing this!  It is interesting.

I ran the perf test on a CentOS 5.4 machine, java
1.6.0_17-b04 64 bit server, Intel core 2 duo E8400 (3 ghz) -- attached
perf-mkm-20100227.txt.  I also show the switch impl close, though
always a bit behind.

Seems like we should just stick with the non-gen'd packed impl?

bq. Note to self: Switch is not equivalent to a series of if-else, when we're talking performance
and when we switch without omissions in the cases.

Right, if the switch cases are compact, it should compile into a fast jump
table (though it may still do an unecessary bounds check).

I think, once we removed aligned, this is ready to commit?  I think we
should land this on flex branch?  (It's using CodecUtil, BytesRef --
I'll merge them when I commit).  Then I can cutover the terms index to
use packed ints.


> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>    Affects Versions: Flex Branch
>            Reporter: Michael McCandless
>            Priority: Minor
>             Fix For: Flex Branch
>
>         Attachments: generated_performance-te20100226.txt, LUCENE-1990-te20100122.patch,
LUCENE-1990-te20100210.patch, LUCENE-1990-te20100212.patch, LUCENE-1990-te20100223.patch,
LUCENE-1990-te20100226.patch, LUCENE-1990-te20100226b.patch, LUCENE-1990-te20100226c.patch,
LUCENE-1990.patch, LUCENE-1990_PerformanceMeasurements20100104.zip, perf-mkm-20100227.txt,
performance-te20100226.txt
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message