hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-hadoop Wiki] Update of "Bigtable&Sawzall" by udanax
Date Mon, 12 Feb 2007 10:08:56 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Bigtable%26Sawzall

------------------------------------------------------------------------------
- BigTable Overview
+ == BigTable Overview ==
  
- What is a BigTable?  
+ '''What is a BigTable?'''
+ 
  BigTable is a multi-dimensional, sparse map storage with its focus on DFS’s massive data
storage and easier data analysis and development. It could also be defined as a distributed
database that is more economical than traditional large databases that allows faster analysis
on more diverse data. It does not manage every pre-calculation but it stores data in a distributed
way with a structure that allows distributed computation. 
  
+ 
- Why do we need it?
+ '''Why do we need it?'''
+ 
   * The amount of data is enormous and it grows exponentially. On top of the simple storage
needs, we would like to do some data analysis as well. 
   * We want our DB to be light-weight. We want our DB to adopt to the ever-changing needs
and requirements of new services.
  
- Conclusion : We want to extract more value out of a company’s data by providing more availability
and usability when the company’s needs arise.
+ '''Conclusion''' : We want to extract more value out of a company’s data by providing
more availability and usability when the company’s needs arise.
  
+ 
- An usage example of BigTable – User action log data table for a service 
+ '''An usage example of BigTable – User action log data table for a service'''
+ 
  To help make a business decision, to find a way to meet the need of each customer, or to
find a product or a market that will bring big profits, we group together action logs of users
and create a User Table like the one below.  
  
  row [ user ], attribute columns [ search history, item buying log, post scrap log, Page
Viewing log, User neighborhood (blog), User active part (cafe) ]
@@ -20, +25 @@

  
  [http://mirror.udanax.org/~udanax/rsync1/download/NB_BoardData_006002/Figure1.jpg]
  
-  “Who referred to document A?” “What other documents do they also like?” “What
does a user who actively participates in a online community X like to search?” “Who are
the neighbors of this blog’s author?” “What are social distances between them?” 
+  `Who referred to document A?` `What other documents do they also like?` `What does a user
who actively participates in a online community X like to search?` `Who are the neighbors
of this blog’s author?` `What are social distances between them?` 
- By finding out where new markets are being formed by managing and analyzing those user-related
data, we can analyze the evolution of services faster and more economically. 
  
+ By finding out where new markets are being formed by managing and analyzing those user-related
data, we can analyze the evolution of services faster and more economically. 
+ 

Mime
View raw message