hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hbase/PoweredBy" by stack
Date Mon, 11 May 2009 21:52:32 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by stack:

The comment on the change is:
Removed wikia -- no longer around.

  [http://www.videosurf.com/ VideoSurf] - "The video search engine that has taught computers
to see". We're using Hbase to persist various large graphs of data and other statistics. Hbase
was a real win for us because it let us store substantially larger datasets without the need
for manually partitioning the data and it's column-oriented nature allowed us to create schemas
that were substantially more efficient for storing and retrieving data.
- [http://www.wikia.com/wiki/Wikia Wikia] hosts its user and keyword databases on a cluster
of 7 machines.
  [http://www.worldlingo.com/ WorldLingo] - The !WorldLingo Multilingual Archive. We use HBase
to store millions of documents that we scan using Map/Reduce jobs to machine translate them
into all or selected target languages from our set available machine translation languages.
We currently store 12 million documents but plan to eventually reach the 450 million mark.
HBase allows us to scale out as we need to grow our storage capacities. Combined with Hadoop
to keep the data replicated and therefore fail-safe we have the backbone our service can rely
on now and in the future. 
  [http://www.yahoo.com/ Yahoo!] uses HBase to store document fingerprint for detecting near-duplications.
We have a cluster of few nodes that runs HDFS, mapreduce, and HBase. The table contains millions
of rows. We use this for querying duplicated documents with realtime traffic.

View raw message