lucene-pylucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andi Vajda <>
Subject Re: how to instantiate a Set?
Date Sun, 22 Feb 2009 22:47:24 GMT

On Sun, 22 Feb 2009, Bill Janssen wrote:

> I'm probably missing something incredibly obvious here...
> I'm trying to call MoreLikethis.setStopWords(Set words).  I've got a
> list of stop words in Python, but I can't figure out how to turn that
> into a Java Set.  I tried "lucene.HashSet(set(words)",
> "lucene.HashSet(lucene.ArrayList(JArray("string")(words)))", and so
> forth, without much luck.

PyLucene doesn't wrap the java.util.Arrays class that fills in the Java gap 
between arrays and collections. That should be considered an oversight of 
mine. I should add it to the JCC invocation in PyLucene's Makefile. Then you 
would be able pass your JArray instance to Arrays toList() method to make an 
ArrayList and finally feed that to a HashSet.

Another alternative is to implement a Python extension of the Java Set 
interface. Guess what ? that is already part of PyLucene. The PythonSet 
class is the extension point for implementing a Java Set in Python and that 
is part of the PyLucene distribution.

I even have such a Python implementation of a Java Set, called, 
ready here but it's not currently shipping with PyLucene, another oversight 
of mine. I should add it to the distribution.

Until then, here it is below. It takes a python set instance as constructor 
argument and implements the complete Java Set interface. This example also 
illustrates a Python implementation of the Java Iterator interface.

Please, let me know if this works for you.
Thanks !



from lucene import PythonSet, PythonIterator, JavaError

class JavaSet(PythonSet):

     def __init__(self, _set):
         super(JavaSet, self).__init__()
         self._set = _set

     def add(self, obj):
         if obj not in self._set:
             return True
         return False

     def addAll(self, collection):
         size = len(self._set)
         return len(self._set) > size

     def clear(self):

     def contains(self, obj):
         return obj in self._set

     def containsAll(self, collection):
         for obj in collection:
             if obj not in self._set:
                 return False
         return True

     def equals(self, collection):
         if type(self) is type(collection):
             return self._set == collection._set
         return False

     def isEmpty(self):
         return len(self._set) == 0

     def iterator(self):
         class _iterator(PythonIterator):
             def __init__(_self):
                 super(_iterator, _self).__init__()
                 _self._iterator = iter(self._set)
             def hasNext(_self):
                 if hasattr(_self, '_next'):
                     return True
                     _self._next =
                     return True
                 except StopIteration:
                     return False
             def next(_self):
                 if hasattr(_self, '_next'):
                     next = _self._next
                     del _self._next
                     next =
                 return next
         return _iterator()

     def remove(self, obj):
             return True
         except KeyError:
             return False

     def removeAll(self, collection):
         result = False
         for obj in collection:
                 result = True
             except KeyError:
         return result

     def retainAll(self, collection):
         result = False
         for obj in list(self._set):
             if obj not in c:
                 result = True
         return result

     def size(self):
         return len(self._set)

     def toArray(self):
         return list(self._set)

View raw message