incubator-blur-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron McCurry <amccu...@gmail.com>
Subject Re: Reducer to outputformat
Date Mon, 20 May 2013 13:14:07 GMT
You could write your own reducer in the new paradigm and the documents
within a row are ordered by record id.  So the first one in could be your
primedoc document.  If that's what you are after.

Although  I think a better approach would be to implement a secondary sort
in hadoop to enforce the record id ordering in mapreduce so you don't have
to buffer the whole row.  I could implement that in the mapreduce lib in
Blur, just create an issue and I will give it a try.

Aaron


On Mon, May 20, 2013 at 8:37 AM, Tim Williams <williamstw@gmail.com> wrote:

> In the move to outputformat, I don't see how we get our "last chance"
> to fiddle with the indexed docs like we do today with the reducer
> approach (e.g. documentsToIndex(..)   Is that right?  My current usage
> of documentsToIndex is likely flawed in the new "temporary index"
> paradigm anyway because I kinda depend on them being buffered, so i
> reckon I'd have to come up with something different anyway...
>
> --tim
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message