lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Cater <ja...@ncsmags.com>
Subject Re: Commented: (SOLR-216) Improvements to solr.py
Date Tue, 29 May 2007 19:41:37 GMT
Mike, 

I've had my solr.py in production use internally for about a month now. 
So, as you can imagine, I've worked through a few oddball bugs that 
occasionally pop up.  It seems pretty stable for me.

I'm planning to upload a new file attachment to this issue containing my 
changes, plus fixing the bug reports that were filed against my open 
ticket.  But first, a few quick questions....

I would prefer to have a complete directory structure (i.e., setup.py, 
unit tests, samples, etc) instead of just the solr.py file.  Would 
anyone see a problem with this?

Also, on some of your comments:

>  - list comprehensions solely to perform looped execution are harder to parse and slower
than explicitly writing a for loop
>   

List comprehensions seem to be a matter of contention for some.  
However, it's a battle I'm not interested in fighting, so changed it to 
a for loop.

>  - shadowing builtins is generally a bad idea
>   

Any shadowing of builtins was unintentional.  Did you see specific 
examples?  I run the code through pychecker and pylint to try to avoid 
such cases.

>  - SolrConnection is an old-style class, but Response is new-style 
>   

This was a holdover from the old SolrConnection class I copied from. I'm 
fixing this.

> functionality:
>
>  - why are 'status'/'QTime' returned as floats?
>   

This was just a misunderstanding on my part of what QTime was actually 
returning.  Fixing.

>  - all NamedList's appearing in the output are converted to dicts--this loses information
(in particular, it will be unnecessarily hard for the user to use highlighting/debug data).
 Using the python/json response format would prevent this.  Not returning highlight/debug
data in the standard response format (and yet providing said parameters in the query() method)
seems odd.  Am I missing something?  Oh, they are set as dynamic attributes of Response, I
see.  Definitely needs documentation.
>   

Yes, this needs to be documented.  (Please c.f. to my question about 
allowing a complete directory structure.)

>  - passing fields='' to query() will return all fields, when the desired return is likely
no fields
>   

I've changed the default for fields= to be '*', instead of None or "".  
This way, passing in 'fields=""' will result in 'fl=' being passed to 
the backend.  However, I still don't see the point, as passing both 
'fl=' and 'fl=*' return the exact same set of fields (i.e., "all") on my 
test setup.

>  - it might be better to settle on an api that permits doc/field boosts.  How about using
a tuple as the field name in the field dict?
>
> conn.add_many([{'id': 1, ('field2', 2.33): u"some text"}])
>
> doc boosts could be handled by optionally providing the fielddict as a (<fielddict>,
boost) tuple.
>   

I agree. I was not aware of field boosts at the time. I'll code this change.

> - for 2.5+, a cool addition might be:
>
> if sys.version > 2.5
>    import contextlib   
>    def batched(solrconn):
>           solrconn.begin_batch()
> 	yield solrconn
> 	solrconn.end_batch()
>   batched = contextlib.contextmanager(batched)
>
> Use as:
>
> with batched(solrconn):
>        solrconn.add(...)
>        solrconn.add(...)
>        solrconn.add(...)
>   

Adding...

Mime
View raw message