couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Dionne <dio...@dionne-associates.com>
Subject Re: multiview on github
Date Mon, 20 Sep 2010 10:19:43 GMT
I see, neat. 

I ask because you might treat disjunction and conjunction  differently in terms of whether
you run around the ring or broadcast to all the nodes. For conjunctions you need all to succeed
so broadcast might fare better whereas for disjunctions only one need succeed. I suppose it
would depend largely on the number of views and the amount of each computation.

Anyway I guess I have mixed feelings about seeing this in core. I see a lot of folks already
struggling to get their arms around working with map/reduce. It would make a good plugin for
advanced users. Actually the ability to have plugins is almost there now. I have an indexer
that only requires some ini file mods and getting the code on the classpath. I think all that's
needed at this point is:

1. conventions for a plugins directory

2. way of specing gen_servers in order to supervise them

3. some apis around some of the internals.

I'm oversimplifying it for sure, the devils in the details and it's the kind of thing programmers
love to argue about ad nauseum but no one wants to do it (myself included :)

Best,

Bob



On Sep 19, 2010, at 10:22 AM, Norman Barker wrote:

> Bob,
> 
> it is just checking that a given id participates in a view, if it
> makes it around the ring then it wins and gets streamed to the client,
> adding disjoints would be fairly simple. Currently the only way I can
> check if an id is in a view is to loop over the results of each view,
> hence each node in the ring is in its own process to keep things
> moving.
> 
> A use case is two views, one that emits datetime (numeric) and another
> view that emits values, e.g. A, B, C ..., the query would then be to
> find the all documents with value A between start time and end time.
> 
> Norman
> 
> On Sun, Sep 19, 2010 at 5:21 AM, Robert Dionne
> <dionne@dionne-associates.com> wrote:
>> I took another peek at this and I'm curious as to what it's doing. Is it just checking
that a given id participates in a view? So if it makes it around the ring it wins? Or is it
actually computing the result of passing the doc thru all the views?
>> 
>> If the answer is the former then would disjunction also be something one might want?
I'm just curious, I don't have a use case and I forget the original discussion around this.
I sort of think of views as a functional mapping from the database to some subset. That's
not entirely accurate given there's this reduce phase also. So I could imagine composing views
in a functional way, but the same thing can be had with just a different map function that
is the composition.
>> 
>> Anyway if you have a brief description of this, with a use case,  it would help.
>> 
>> Cheers,
>> 
>> Bob
>> 
>> 
>> 
>> 
>> On Sep 17, 2010, at 11:32 PM, Norman Barker wrote:
>> 
>>> Chris, James
>>> 
>>> thanks for bumping this, we are using this internally at 'scale'
>>> (million+ keys). I want this to work for couchdb as we want to give
>>> back for such a great product and support this going forward, so any
>>> suggestions welcomed and we will test and add them to the local github
>>> account with the aim of getting this into trunk.
>>> 
>>> Norman
>>> 
>>> On Fri, Sep 17, 2010 at 7:00 PM, James Hayton <theboss@purplebulldog.com>
wrote:
>>>> I want to use it!  I just haven't gotten around to it.  I was going to try
>>>> and test it out this weekend and if I am able, I will certainly report back
>>>> what I find.
>>>> 
>>>> James
>>>> 
>>>> On Fri, Sep 17, 2010 at 5:55 PM, Chris Anderson <jchris@apache.org>
wrote:
>>>> 
>>>>> On Mon, Aug 30, 2010 at 10:58 AM, Norman Barker <norman.barker@gmail.com>
>>>>> wrote:
>>>>>> Bob,
>>>>>> 
>>>>>> I can and have been testing the multiview at this scale, it is ok
>>>>>> (fast enough), but I think being able to test inclusion of a document
>>>>>> id in a view without having to loop would be a considerable speed
>>>>>> improvement. If you have any ideas let me know.
>>>>>> 
>>>>> 
>>>>> I just want to bump this thread, as I think this is a useful feature.
>>>>> I don't expect to be able to test it in the coming weeks, but if I did
>>>>> I would. Is anyone besides Norman using this? Has anyone used it at
>>>>> scale?
>>>>> 
>>>>> Cheers,
>>>>> Chris
>>>>> 
>>>>>> thanks,
>>>>>> 
>>>>>> Norman
>>>>>> 
>>>>>> On Mon, Aug 30, 2010 at 10:49 AM, Robert Newson <robert.newson@gmail.com>
>>>>> wrote:
>>>>>>> I'm sorry, I've had no time to play with this at scale.
>>>>>>> 
>>>>>>> On Mon, Aug 30, 2010 at 5:35 PM, Norman Barker <norman.barker@gmail.com>
>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> are there any more comments on this, if not can you describe
the
>>>>>>>> process (in particular how to obtain a wiki and jira account
for
>>>>>>>> couchdb which I have been unable to do) and I will start
documenting
>>>>>>>> this so we can put this into the trunk.
>>>>>>>> 
>>>>>>>> Bob, were you able to do any more testing with large views,
are there
>>>>>>>> any suggestions on how to speed up the document id inclusion
test as
>>>>>>>> described below?
>>>>>>>> 
>>>>>>>> thanks,
>>>>>>>> 
>>>>>>>> Norman
>>>>>>>> 
>>>>>>>> On Mon, Aug 23, 2010 at 9:22 AM, Norman Barker <
>>>>> norman.barker@gmail.com> wrote:
>>>>>>>>> Bob,
>>>>>>>>> 
>>>>>>>>> thanks for the feedback and for taking a look at the
code. Guidelines
>>>>>>>>> on when to use a supervisor within couchdb with a gen_server
would be
>>>>>>>>> appreciated, currently I have a supervisor and a gen_server,
but if
>>>>>>>>> couchdb has a supervision process I could remove that
layer.
>>>>>>>>> 
>>>>>>>>> I think plugins is a great idea, however intersection
of views is such
>>>>>>>>> as common request, perhaps there needs to plugin system
and if a
>>>>>>>>> plugin is rated enough it goes into trunk as a core feature.
>>>>>>>>> 
>>>>>>>>> the four (or slightly more) summary is here
>>>>>>>>> 
>>>>>>>>> 
>>>>> http://github.com/normanb/couchdb/raw/trunk/src/couchdb/couch_query_ring.erl
>>>>>>>>> 
>>>>>>>>> %
>>>>>>>>> % send an id from the start list to the next node in
the ring, if the
>>>>>>>>> id is in adjacent node then the this node sends to the
next ring node
>>>>>>>>> ....
>>>>>>>>> % if the id gets all round the ring and back to the start
node then is
>>>>>>>>> has intersected all queries and should be included. The
nodes in the
>>>>>>>>> ring
>>>>>>>>> % should be sorted in size from small to large for this
to be
>>>>> effective
>>>>>>>>> %
>>>>>>>>> % In addition send the initial id list round in parallel
>>>>>>>>> 
>>>>>>>>> it really needs some eyes from the core couchdb coders
to see how to
>>>>>>>>> speed up the inclusion testing, looping is bad even if
it is done in
>>>>>>>>> parallel.
>>>>>>>>> 
>>>>>>>>> Multiview is usable, I am using it with some pretty big
mega-views (as
>>>>>>>>> per the raindrop) model, I am also available to add features
to this
>>>>>>>>> as this is core part of our work and we want to give
it to couch as a
>>>>>>>>> contribution.
>>>>>>>>> 
>>>>>>>>> thanks,
>>>>>>>>> 
>>>>>>>>> Norman
>>>>>>>>> 
>>>>>>>>> On Mon, Aug 23, 2010 at 5:05 AM, Robert Dionne
>>>>>>>>> <dionne@dionne-associates.com> wrote:
>>>>>>>>>> Hi Norman,
>>>>>>>>>> 
>>>>>>>>>>  I took a peek at multiview. I haven't followed this
too closely on
>>>>> the mailing list but this is *view intersection*? Is there a 5 line summary
>>>>> of what this does somewhere?
>>>>>>>>>> 
>>>>>>>>>>  I'm curious as to why the daemon needs to be a supervisor,
most if
>>>>> not all of the other daemons are gen_servers. OTP allows this but I think
>>>>> this is a good area where some CouchDB guidelines on plugins would apply.
>>>>>>>>>> 
>>>>>>>>>>  It strikes me that views, the use of map/reduce,
etc. are one of the
>>>>> trickier aspects of using CouchDB, particularly for new users coming
from
>>>>> the SQL world. People are also reporting issues with performance of views,
I
>>>>> guess often because reduce functions go out of control.
>>>>>>>>>> 
>>>>>>>>>>  I think the project would be better served if features
like this
>>>>> were available as plugins. I would put GeoCouch in the same category.
Its
>>>>> very neat and timely (given everyone wants to know where everyone else
is
>>>>> using their telephone but without talking other than asynchronously),
but a
>>>>> server plugin architecture that would allow this to be done cleanly should
>>>>> come first.
>>>>>>>>>> 
>>>>>>>>>>  This is just my opinion. I'd love to see some of
the project
>>>>> founders and committers weigh in on this and set some direction.
>>>>>>>>>> 
>>>>>>>>>> Best regards,
>>>>>>>>>> 
>>>>>>>>>> Bob
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Aug 22, 2010, at 5:45 PM, Norman Barker wrote:
>>>>>>>>>> 
>>>>>>>>>>> I would like to take this multiview code and
have it added to trunk
>>>>> if
>>>>>>>>>>> possible, what are the next steps?
>>>>>>>>>>> 
>>>>>>>>>>> thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Norman
>>>>>>>>>>> 
>>>>>>>>>>> On Wed, Aug 18, 2010 at 11:44 AM, Norman Barker
<
>>>>> norman.barker@gmail.com> wrote:
>>>>>>>>>>>> I have made
>>>>>>>>>>>> 
>>>>>>>>>>>> http://github.com/normanb/couchdb
>>>>>>>>>>>> 
>>>>>>>>>>>> which is a fork of the latest couchdb trunk
with the multiview code
>>>>>>>>>>>> and tests added.
>>>>>>>>>>>> 
>>>>>>>>>>>> If geocouch is available then it can still
be used.
>>>>>>>>>>>> 
>>>>>>>>>>>> There are a couple of questions about the
multiview on the user
>>>>> /dev
>>>>>>>>>>>> list so I will be adding some more test cases
during today.
>>>>>>>>>>>> 
>>>>>>>>>>>> thanks,
>>>>>>>>>>>> 
>>>>>>>>>>>> Norman
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Aug 17, 2010 at 9:23 PM, Norman Barker
<
>>>>> norman.barker@gmail.com> wrote:
>>>>>>>>>>>>> this is possible, I forked geocouch since
I use it, but I have
>>>>> already
>>>>>>>>>>>>> separated the geocouch dependencies from
the trunk.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> I can do this tomorrow, certainly be
interested in any feedback.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Norman
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Aug 17, 2010 at 7:49 PM, Volker
Mische <
>>>>> volker.mische@gmail.com> wrote:
>>>>>>>>>>>>>> On 08/18/2010 03:26 AM, J Chris Anderson
wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Aug 16, 2010, at 4:38 PM,
Norman Barker wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I have made the changes as
recommended, adding a test case
>>>>>>>>>>>>>>>> multiview.js and also adding
the userCtx to open the db.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I have also forked geocouch
and this is available here
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> this patch seems important (especially
as people are already
>>>>> asking for
>>>>>>>>>>>>>>> help using it on user@)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> to get it committed, it either
must remove the dependency on
>>>>> GeoCouch, or
>>>>>>>>>>>>>>> become part of CouchDB when (and
if) GeoCouch becomes part of
>>>>> CouchDB.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Is it possible / useful to make
a version that doesn't use
>>>>> GeoCouch? And
>>>>>>>>>>>>>>> then to make the GeoCouch capabilities
part GeoCouch for now?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Chris
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi Norman,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> if the patch is ready for trunk,
I'd be happy to move the
>>>>> GeoCouch bits to
>>>>>>>>>>>>>> GeoCouch itself (as GeoCouch isn't
ready for trunk yet).
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Lately I haven't been that responsive
when it comes to GeoCouch,
>>>>> but that
>>>>>>>>>>>>>> will change (in about a month) after
holidays and FOSS4G.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>  Volker
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Chris Anderson
>>>>> http://jchrisa.net
>>>>> http://couch.io
>>>>> 
>>>> 
>> 
>> 


Mime
View raw message