hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Hbase/PNUTS" by stack
Date Thu, 11 Oct 2007 23:01:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by stack:

The comment on the change is:
Notes on PNUTS

New page:

Here are some notes on a talk given October 5th, 2007 by Utkarsh Srivastava on [http://research.yahoo.com/node/212
PNUTS], a Platform for Nimble Universal Table Storage.

>>From the [http://coe.berkeley.hosted.webevent.com/cgi-bin/webevent.cgi?cmd=listevent&ncmd=listday&cal=cal2&id=9991&ncals=&de=1&tf=0&sib=1&sb=0&sa=0&ws=1&stz=Default&sort=e,m,t&cat=&swe=1&cf=list&set=1&m=10&d=5&y=2007
seminar description]:

''The PNUTS project is to build a data management service for providing back-end support to
Yahoo!'s web applications. To obtain acceptable latency and throughput while operating at
Yahoo!'s scale, PNUTS uses massive parallelism and distribution---data is partitioned and
replicated over thousands of servers. At the same time, PNUTS provides clean abstractions
for data access that hide all this system complexity from the application programmer...In
contrast to traditional database solutions, PNUTS is a centrally hosted and managed data service.
Such a shared shared service model frees applications from the burden of having to set up,
maintain and scale their own data store, and also amortizes the operational cost across all
of Yahoo!'s applications.

Utkarsh, who also works on [http://research.yahoo.com/node/90 PIG], started with an outline
of hurdles a startup faces building applications at web-scale (millions of users, terrabyte+
of data).  You'll need a replicated, persistent datastore, caching, messaging between all
systems to manage coherency, etc.  You'll then throw away your first implementation.  "If
you have funding left, THEN you can start to work on application logic."  PNUTS wants to make
it so deploying a web-scale application requires nought but "...three guys, a weekend, and
some PHP." (This latter quote is apparently from Mr. Del.icio.us).

For a folksy intro on how most of the big web apps can make do w/ just basic insert, update,
and delete, see the [http://research.yahoo.com/node/212 PNUTS] home page. Utkarsh had his
own spin using flickr and del.icio.us for illustration (not that these apps currently run
on PNUTS).

PNUTS Nuggets:

 * Goals: Low-latency, self-tuning, replicated data service with automated fail-over and recovery.
 * This is not just a research project.  Has active participation of yahoo infrastructure
 * No plans to open-source.
 * PNUTS is NOT a multi-media store.  Its more for metadata.  Data is expected to be located
elsewhere.  PNUTS hosts user prefs, pointers to data, etc.
 * ACID gurantees are relaxed for the sake of performance. Some clients may get stale data.
 * The implemented basic relational operators do not allow for adhoc analysis and bulk processing;
use [http://research.yahoo.com/node/90 PIG] or hadoop instead.
 * They have a PNUTS SQL-like language but its very basic: no support for joins, aggregations,
 * Two table types: An Hash Table (key/value with put, get, update, delete, etc.) and An Ordered
Table -- ordered by keys -- with typed columns.  Against the Order Table you can run all Hash
Table operations plus 'limit (top K)', 'predicates', etc. Currently only Hash Table has been
implemented.  Even Order Tables are key/value; the value has within it 'columns'.
 * A PNUTS install is made up of Regions: e.g. west-coast Region, east-coast Region, east-asian
Region, etc.  Data is replicated between Regions (e.g. east-coast region has an up-to-date
replica of west-coast, etc.).  Within a region, a Tablet Controller load balances tablets
-- "horizontal table fragments" -- over many Tablet Servers.  A tablet holds some subset of
the rows of a table (PNUTS is row-based -- later there may be support for column-based: "PNUTS
horizontally fragments tables.  Later there may be support for vertical fragmentation"). 
Routers -- which are not the same thing as Tablet Controllers -- direct clients to the tablet
server hosting the requested row.  PNUTS makes use of a yahoo "Log Service" to which all edits
are written first before they are committed to the pertinent Tablet Server.  Rows have a Write
Master who administers all of its updates. To keep write latency low, the Write Master for
a row is located close to the client writing: e.g. rows for J
 apanese users will have their Write Master in the Japanese Region (Write responsibility can
apparently migrate across Regions to move closer to the updating client).  Recovery -- having
a lapsed Region get up-to-date copying from another Region -- is probably the most complicated
part of the system coordinating many Write Masters and reading the log-of-edits from Log Servers.
 * Features: Test and set (write only if version is same as that specified, otherwise fail:
i.e. row-level 'transaction'), "critical reads" (I will for sure see my edits), and "read
latest" (You may see stale data).
 * Talked of 'indexes'.  Indexes are an optional secondary table lazily maintained by the
system that is keyed using a column other than that of the key used in the primary column:
Example was an ordered table that had fruits for keys and columns of price and description.
  Using the primary table you could do "all fruits between apple and tomato" but you couldn't
do "all fruits priced between $2 and $5 dollars a pound".  You can configure a secondary table
keyed by price to do the latter.
 * TODO:
  * A separate API for bulk inserts, updates, etc.  Rationale was that proceeding through
bulk data in order means beating up Tablet Servers in series.  Better to have  two passes:
one to look at data and then in the second do a parallel upload.
  * Some support for views/joins maintained by the system.  Folks are going to do this in
applications anyways.  Might as well have support within the system.

View raw message