incubator-blur-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Blur Wiki] Update of "DataStructure" by AaronMcCurry
Date Tue, 06 Nov 2012 03:03:33 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Blur Wiki" for change notification.

The "DataStructure" page has been changed by AaronMcCurry:

New page:
In Blur 0.2 we are going to drop the idea of Rows and Records.  These were just artificial
constructs around Lucene documents they only caused confusion.

Blur will follow Lucene’s data structure as closely as possible.

 * A Document contains 1 or more fields, with a String as the name and a byte[] (ByteBuffer
in thrift) as the value with a Type to define to data that is being passed in the given field.
 * Document(s) can be added, updated, and deleted in a table.

While the documents are being added or updated within a single call (Thrift or MapReduce),
they will be guaranteed to exist within the same Lucene segment.  This will be used during
some specialized queries.  See the Lucene IndexWriter for more details.

In the MapReduce framework, the ability to group several Documents together for a single add
or update call will be implemented through the use of common key output in the Mappers.  The
key type will likely need to be defined by the end user based on their needs.  The key itself
won't need to be used.

Analyzers will allow for more customized indexes to be created.  Currently this functionality
is wrapped up in the TableDescriptor, this will need to change.

View raw message