commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [Math] Improving combinationIterator?
Date Tue, 27 Aug 2013 06:12:57 GMT
On 8/26/13 4:37 AM, Gilles wrote:
> On Sun, 25 Aug 2013 17:14:37 -0700, Phil Steitz wrote:
>> On 8/25/13 9:59 AM, Gilles wrote:
>>> On Sun, 25 Aug 2013 09:19:41 -0700, Phil Steitz wrote:
>>>> On 8/25/13 8:10 AM, Gilles wrote:
>>>>> On Sat, 24 Aug 2013 21:55:36 -0000, psteitz@apache.org wrote:
>>>>>
>>>>>> [...]
>>>>>
>>>>> I wonder whether the utility should involve the creation of
>>>>> an instance of a bona fide "Combination" class.
>>>>> I.e. instead of a "naked"
>>>>>   Iterator<int[]> combinationIterator(int n, int k)
>>>>> there would be
>>>>>   Iterator<Combination> combinationIterator(int n, int k)
>>>>> where "Combination" would provide an accessor
>>>>>   public int[] toArray() {
>>>>>     // ...
>>>>>   }
>>>>
>>>> I need the "naked arrays" generated very quickly with minimal
>>>> overhead for the K-S use case.
>>>
>>> The only "overhead" would be the creation of an instance for
>>> wrapping the "int[]".
>>> [As allocation is fairly cheap in Java, that should not be a
>>> problem.]
>>>
>>>> My use case uses the arrays
>>>> directly, but others could use the indices to extract objects from
>>>> lists.  For example,  what I thought would be natural would be to
>>>> add
>>>> public static <T> Iterator<List<T>> combinationsIterator(List<T>
>>>> list, int k)
>>>
>>> Would this be the base implementation, from which your case would
>>> be implemented by retrieving "Integer" from the list and
>>> copy/convert
>>> them into an "int[]"?
>>
>> No, the opposite actually.  The base impl is what exists now.  It
>> works efficiently directly on an int[] array to compute the indices
>> within the source list (in my use case, {0, 1, ..., n-1} which is
>> actually canonical).  Then if you want to source subsets of objects
>> from a list of objects, you can compute the int[] array that
>> actually *is* the combination and extract the corresponding elements
>> from the list.
>
> I got it.
>
>>>
>>>>
>>>> What you would get out of this is element lists.  And the
>>>> implementation could postpone copy / creation until the indices
>>>> have
>>>> been selected using the existing method.
>>>
>>> I don't follow ...
>>
>> See above.  I am probably still not explaining clearly what I have
>> in mind.  To me, a "combination" is a subset of the integers {0,
>> ..., n-1}.  Technically it is a set, not a list, but there is a
>> natural order available and enumeration can work easier if you use
>> that order and adopt the convention that you represent combinations
>> as ordered lists of integers.  The LexicographicCombinationIterator
>> that I committed does that, using primitive arrays to represent the
>> lists.  The iterator itself uses a primitive array of int[] to
>> maintain state and create successive iterates.  Use of a single
>> primitive array eliminates all object creation / garbage collection,
>> which is not necessary to just enumerate the combinations.  Once you
>> have a "true" combination represented as an array of int[], you can
>> use it to specify a "combination"  (really sample or subset) of a
>> list of any kind of object.
>
> So, IIUC it would be useful to have a utility like
> -----
>  public static List<T> getSample(List<T> elements,
>                                  int[] combination) {
>    final List<T> samples = new ArrayList<T>(combination.length);
>    for (int index : combination) {
>      samples.add(elements.get(index));
>    }
>  }

Yes, that is what the List version would use.  Not sure how
generically useful it would be, but I can imagine other use cases. 
It might be best to call it filter or subList or something and put
it in MathArrays.  Not sure where it would belong as a generic utility.


> -----
>
>>>
>>>>  I am not yet sold on the
>>>> value of "Combination" as an abstraction, since what you really
>>>> use
>>>> is just the list of selected elements.
>>>
>>> Maybe "Combination" is not worth implementing indeed; but what
>>> about
>>> a "Combinations" class:
>>>
>>> public class Combinations implements Iterable<int[]> {
>>>   private final int n;
>>>   private final int k;
>>>
>>>   public Combinations(int n, int k) {
>>>     // ...
>>>   }
>>>
>>>   public Iterator<int[]> iterator() {
>>>     return new LexicographicCombinationIterator(n, k);
>>>   }
>>>
>>>   public Comparator<int[]> comparator() {
>>>     return new LexicographicCombinationComparator(n, k);
>>>   }
>>>
>>>   private static class LexicographicCombinationIterator implements
>>> Iterator<int[]> {
>>>     // [Your implementation]
>>>   }
>>>
>>>   private static class LexicographicCombinationComparator
>>> implements Comparator<int[]> {
>>>       public int compare(int[], int[]) {
>>>       // ... using e.g. "lexNorm" implementation.
>>>     }
>>>   }
>>> }
>>>
>>> Being "Iterable" allows nice usage in a "for" loop:
>>>
>>> for (int[] combination : new Combinations(4, 2)) {
>>>   // ... etc.
>>> }
>>
>> I thought about that and that is what I meant by the original
>> suggestion that we possibly make the
>> LexicographicCombinationsIterator a public class.  I just don't see
>> the practical use cases for it as a standalone class.
>
> The "for"-loop example above uses the iterator functionality but
> in a much less verbose way (no explicit calls to "hasNext()" and
> "next()").
> This is standard practice for everything that can be handled as a
> collection.
>
> Then, we can also have your iterator factory method:
> -----
> public static Iterator<int[]> combinationsIterator(int n, int k) {
>   return new Combinations(4, 2).iterator();
> }
> -----
>
> Then, we also don't hide some code in "test" where it could be made
> visible and be used elsewhere (cf. "lexNorm").

OK, I see your point.
>
>>  What I have
>> immediate practical need for is just a way to get an Iterator<int[]>
>> based on <n,k>.  As you note above, it is the Iterator that is
>> useful.  I can imagine use cases for the object-list version
>> public static <T> Iterator<List<T>> combinationsIterator(List<T>
>> list, int k ) though.  If we do extract a public class, I would
>> prefer that it be called something like CombinationsIterator.
>
> I think that a design based on the "Iterable" interface is
> preferrable.
> From it you can always get the iterator functionality (by
> definition of
> the interface), while the reverse is not true.

Agreed.
>
>> If we
>> have other iteration strategies that we want to implement, we could
>> make it an abstract parent for LexicographicCombinationsIterator.  I
>> would recommend holding off complicating the design / structure
>> though until we have these though.
>
> Sorry to insist but it is not a complication, it is a
> _simplification_ ;-)
> Instead of a specifically named method ("combinationsIterator"), 
> we rely
> on the standard API ("Iterable") of the Java language.

Yeah, I get it now.  Go for it or open a ticket so we don't forget.

Phil
>
>
> Gilles
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message