lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Toke Eskildsen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1990) Add unsigned packed int impls in oal.util
Date Fri, 12 Feb 2010 02:57:28 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Toke Eskildsen updated LUCENE-1990:
-----------------------------------

    Attachment: LUCENE-1990-te20100212.patch

I've read through the comments on LUCENE-1990 and implemented most of what has been suggested.
The attached patch contains implementations for all the variants we've talked about, including
aligned. There's a known bug in persistence for aligned64 (and probably also for aligned32)
that I haven't stomped yet. There's also a clear need for a more elaborate unit-test with
regard to persistence.

Other outstanding issues, as I see them, are whether or not mutable packed arrays should be
requestable (as general purpose data structures) and how the factory for creating a writer
should work. I have added a getMutable-method to the factory and not touched the return type
Reader for the getReader-method. That way read-only users will not be tempted to try and update
the received structure. As for the arguments to the factory, Michael McCandless suggested
that the preferences should be expressed with (packed | aligned32 | aligned64 | auto). As
fas as I can see, this should work. However, I've only just reached this conclusion and haven't
had the time to implement it.

A speed-test has been added and the results from my machine can be seen below. In order for
it to be really usable, it should be tried on other machines too.

I won't touch the code before sometime next week, but I'll keep an eye on LUCENE-1990 comments
until then.

{code}
        bitsPerValue          valueCount            getCount    PackedDirectByte   PackedDirectShort
           Packed32     PackedAligned32     PackedDirectInt            Packed64     PackedAligned64
   PackedDirectLong
                   1                1000            10000000                 167         
       141                 258                 242                 172                 264
                242                 183
                   1             1000000            10000000                 224         
       232                 266                 233                 246                 262
                238                 338
                   1            10000000            10000000                 359         
       469                 280                 278                 508                 278
                272                 551
                   3                1000            10000000                 168         
       166                 265                 241                 163                 262
                243                 166
                   3             1000000            10000000                 227         
       226                 261                 251                 239                 274
                249                 330
                   3            10000000            10000000                 406         
       476                 301                 304                 522                 300
                308                 547
                   4                1000            10000000                 167         
       168                 266                 239                 164                 285
                239                 169
                   4             1000000            10000000                 228         
       231                 294                 274                 262                 291
                269                 314
                   4            10000000            10000000                 385         
       480                 308                 333                 514                 331
                315                 557
                   7                1000            10000000                 172         
       174                 278                 248                 162                 271
                238                 177
                   7             1000000            10000000                 224         
       236                 289                 281                 272                 278
                277                 345
                   7            10000000            10000000                 405         
       473                 389                 447                 516                 399
                402                 553
                   8                1000            10000000                 192         
       171                 268                 242                 174                 291
                240                 163
                   8             1000000            10000000                 226         
       232                 291                 284                 286                 274
                265                 314
                   8            10000000            10000000                 381         
       467                 406                 428                 512                 422
                419                 580

        bitsPerValue          valueCount            getCount   PackedDirectShort         
  Packed32     PackedAligned32     PackedDirectInt            Packed64     PackedAligned64
   PackedDirectLong
                   9                1000            10000000                 166         
       274                 241                 170                 261                 237
                163
                   9             1000000            10000000                 229         
       299                 273                 250                 284                 275
                327
                   9            10000000            10000000                 483         
       443                 477                 519                 438                 455
                568
                  15                1000            10000000                 170         
       265                 239                 174                 264                 235
                162
                  15             1000000            10000000                 232         
       285                 274                 240                 278                 269
                339
                  15            10000000            10000000                 473         
       518                 524                 523                 519                 521
                550
                  16                1000            10000000                 166         
       263                 236                 172                 264                 235
                160
                  16             1000000            10000000                 229         
       285                 278                 244                 293                 272
                332
                  16            10000000            10000000                 470         
       513                 517                 509                 534                 529
                548

        bitsPerValue          valueCount            getCount            Packed32     PackedAligned32
    PackedDirectInt            Packed64     PackedAligned64    PackedDirectLong
                  17                1000            10000000                 262         
       255                 177                 260                 234                 160
                  17             1000000            10000000                 290         
       306                 273                 304                 290                 320
                  17            10000000            10000000                 532         
       572                 533                 529                 556                 551
                  28                1000            10000000                 269         
       256                 187                 267                 238                 163
                  28             1000000            10000000                 293         
       295                 253                 293                 296                 312
                  28            10000000            10000000                 542         
       567                 501                 548                 567                 542
                  31                1000            10000000                 260         
       235                 177                 266                 232                 158
                  31             1000000            10000000                 292         
       294                 244                 296                 297                 328
                  31            10000000            10000000                 552         
       563                 516                 562                 568                 548

        bitsPerValue          valueCount            getCount     PackedDirectInt         
  Packed64     PackedAligned64    PackedDirectLong
                  32                1000            10000000                 172         
       263                 241                 166
                  32             1000000            10000000                 241         
       291                 297                 320
                  32            10000000            10000000                 519         
       556                 573                 546

        bitsPerValue          valueCount            getCount            Packed64     PackedAligned64
   PackedDirectLong
                  33                1000            10000000                 264         
       239                 159
                  33             1000000            10000000                 293         
       374                 319
                  33            10000000            10000000                 559         
       595                 552
                  47                1000            10000000                 264         
       242                 164
                  47             1000000            10000000                 319         
       369                 322
                  47            10000000            10000000                 577         
       601                 548
                  49                1000            10000000                 261         
       243                 162
                  49             1000000            10000000                 323         
       413                 319
                  49            10000000            10000000                 584         
       610                 551
                  63                1000            10000000                 269         
       235                 161
                  63             1000000            10000000                 396         
       369                 313
                  63            10000000            10000000                 592         
       596                 559
{code}

(Java 1.6.0_15-b03, default settings on a Dell Precision M6500: Intel i7 Q 820 @ 1.73GHz,
8 MB level 2 cache,  dual-channel PC 1333 RAM, running Ubuntu Karmic)

> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>         Attachments: LUCENE-1990-te20100122.patch, LUCENE-1990-te20100210.patch, LUCENE-1990-te20100212.patch,
LUCENE-1990.patch, LUCENE-1990_PerformanceMeasurements20100104.zip
>
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message