lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fuad Efendi (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (LUCENE-1990) Add unsigned packed int impls in oal.util
Date Tue, 10 Nov 2009 14:01:30 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775420#action_12775420
] 

Fuad Efendi edited comment on LUCENE-1990 at 11/10/09 2:01 PM:
---------------------------------------------------------------

Specifically for FieldCache, let's see... suppose Field may have 8 different values, and number
of documents is high.

{code}
Value0  0  1  0  0  0  0  0   0  1  0  0  0  0  0 ...  
Value1  1  0  1  0  0  0  0   0  0  0  0  0  0  0 ...  
Value2  0  0  0  1  1  0  0   0  0  0  0  0  0  0 ...  
Value3  0  0  0  0  0  0  0   0  0  0  0  1  0  0 ...  
Value4  0  0  0  0  0  0  1   0  0  0  0  0  0  0 ...  
Value5  0  0  0  0  0  1  0   0  0  0  1  0  1  0 ...  
Value6  0  0  0  0  0  0  0   1  0  1  0  0  0  0 ...  
Value7  0  0  0  0  0  0  0   0  1  0  0  0  0  1 ...  
{code}

- represented as Matrix (or as a Vector); for instance, first row means that Document1 and
Document8 have Value0.

And now, if we go "horizontally" we will end up with 8 arrays of int[]. What if we go "vertically"?
Field could be encoded as 3-bit (8 different values).

CONSTRAINT: specifically for FieldCache, each Column must have the only "1".

And we can end with array of 3-bit values storing position in a column! Size of array is IndexReader.maxDoc().


hope I am reinventing bycycle :)


      was (Author: funtick):
    Specifically for FieldCache, let's see... suppose Field may have 8 different values, and
number of documents is high.

{code}
Value0  0  1  0  0  0  0  0   0  1  0  0  0  0  0 ...  
Value1  1  0  1  0  0  0  0   0  0  0  0  0  0  0 ...  
Value2  0  0  0  1  1  0  0   0  0  0  0  0  0  0 ...  
Value3  0  0  0  0  0  0  0   0  0  0  0 10  0  0 ...  
Value4  0  0  0  0  0  0  1   0  0  0  0  0  0  0 ...  
Value5  0  0  0  0  0  1  0   0  0  0  1  0  1  0 ...  
Value6  0  0  0  0  0  0  0   1  0  1  0  0  0  0 ...  
Value7  0  0  0  0  0  0  0   0  1  0  0  0  0  1 ...  
{code}

- represented as Matrix (or as a Vector); for instance, first row means that Document1 and
Document8 have Value0.

And now, if we go "horizontally" we will end up with 8 arrays of int[]. What if we go "vertically"?
Field could be encoded as 3-bit (8 different values).

CONSTRAINT: specifically for FieldCache, each Column must have the only "1".

And we can end with array of 3-bit values storing position in a column! Size of array is IndexReader.maxDoc().


hope I am reinventing bycycle :)

  
> Add unsigned packed int impls in oal.util
> -----------------------------------------
>
>                 Key: LUCENE-1990
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1990
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Priority: Minor
>
> There are various places in Lucene that could take advantage of an
> efficient packed unsigned int/long impl.  EG the terms dict index in
> the standard codec in LUCENE-1458 could subsantially reduce it's RAM
> usage.  FieldCache.StringIndex could as well.  And I think "load into
> RAM" codecs like the one in TestExternalCodecs could use this too.
> I'm picturing something very basic like:
> {code}
> interface PackedUnsignedLongs  {
>   long get(long index);
>   void set(long index, long value);
> }
> {code}
> Plus maybe an iterator for getting and maybe also for setting.  If it
> helps, most of the usages of this inside Lucene will be "write once"
> so eg the set could make that an assumption/requirement.
> And a factory somewhere:
> {code}
>   PackedUnsignedLongs create(int count, long maxValue);
> {code}
> I think we should simply autogen the code (we can start from the
> autogen code in LUCENE-1410), or, if there is an good existing impl
> that has a compatible license that'd be great.
> I don't have time near-term to do this... so if anyone has the itch,
> please jump!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message