hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <steve.lough...@gmail.com>
Subject Re: Suggestion of Research topic in Hadoop for PhD research
Date Tue, 19 Jun 2012 08:25:19 GMT
On 18 June 2012 18:17, Suresh S <sureshhot@gmail.com> wrote:

> Dear Sir/Madam,
>
>
>                  I joined as a Research scholar(PhD) recently.
> I am interested to do research in cloud computing. Last month i was attend
> one workshop.
> From that, i know about Hadoop. I am very much intrested to do research in
> hadoop.
> Please give some topics and problems to work. Thanks in advance.
> *Regards*
>

Given you are doing PhD, presumably you are expected to start with the
reading of state of the art before diving into the depths of your own work

For that reason, I'm attaching a .bib file containing papers you may want
to read,

This list is incomplete and biased towards work I was doing last year on
data integrity within Hadoop -it omits all of Lamport's work on Distribute
Computing, and all the classic RDBMs papers, the latter list including:
[Chamberlin81] D Chamberlin et al., A History and Evaluation of System R,
1981.
[Codd71] E. F Codd, *A Database Sublanguage Founded on the Relational
Calculus*, 1971
[Date84]: C.J. Date, *A Critique of the SQL Database Language* 1984


Also:
 -everything from Google, Yahoo! Amazon and Microsoft Research groups,
Facebook, etc.
 -the work done in the 1980s and early 1990s on "massively parallel"
computers. They tried out a lot of designs there, some of which could have
relevance again.

Regarding working inside Hadoop itself, be aware that

   - The code is big, complicated and needs testing on large clusters.
   - It's in use in production, which makes people reluctant to accept
   large changes to the core

There are some tactics to address that, especially if you are looking at
the classic CS-hard problems of scheduling, data placement, etc

   - Work in your own scheduler
   - Use the block placement plugin
   - Find other plugin points, or help design one for the specific area you
   want to play in.
   - YARN lets you run completely different applications in a Hadoop
   cluster.

Another thing to be aware of is that because of the R&D money being
invested in the platform, sometimes it does change dramatically -and it is
hard to compete with the efforts of a team of full time developers. For
example, I've long complained that Hadoop wasn't that good in a virtual
world. and last week VMWare published a patch that contains many tens of
thousands of lines of code to address it. Anyone doing a PhD on the same
problem would now be in trouble.

This is why working on a related-but-higher-level stack such as Asterix or
Stratosphere may be a good approach; another is to pick a specific
application problem and look at implementing it within the Hadoop platform.
Steve


.bib file in no particular order:

@Article{ Chen94:raid,
    author = "Peter M. Chen and Edward K. Lee and Garth A. Gibson and Randy
H. Katz and David A. Patterson",
    title = "RAID: High-Performance, Reliable Secondary Storage",
    journal = "ACM Computing Surveys",
    year = "1994",
    volume = "26",
    pages = "145--185"
}

@Misc{ Ghemawat03:gfs,
    author = "Sanjay Ghemawat and Howard Gobioff and Shun-Tak Leung",
    title = "The Google File System",
    year = "2003"
}

@TechReport{ Gray05:diskFailureRates,
    title = "Empirical Measurements of Disk Failure Rates and Error Rates",
    author = "Jim Gray and Catharine van Ingen",
    institution = "Microsoft",
    number = "MSR-TR-2005-166",
    month = dec,
    year = "2005",
    url = "http://research.microsoft.com/apps/pubs/default.aspx?id=64599"
}

@PhDThesis{ fielding:rest,
    author = "Roy Thomas Fielding",
    title = "Architectural Styles and the Design of Network-based Software
Architectures",
    year = 2000,
    school = "University of California",
    type = "{Ph.D.} dissertation",
    note = "http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm"
}

@InProceedings{ indiana:rmiperf,
    title = "{Requirements for and Evaluation of {RMI} Protocols for
Scientific Computing}",
    author = "Madhusudhan Govindaraju and others",
    institution = "Department of Computer Science Indiana University",
    url = "http://www.extreme.indiana.edu/xgws/papers/sc00_paper/index.html
",
    year = "2000",
    booktitle = "Proceedings Supercomputing 2000",
}

@InProceedings{ indiana:soap-limits,
    title = "Investigating the Limits of {SOAP} Performance for Scientific
Computing",
    author = "Kenneth Chiu and Madhusudhan Govindaraju and Randall Bramley",
    booktitle = "Proceedings of HPDC 2002",
    year = 2002,
    note = "
http://www.extreme.indiana.edu/xgws/papers/soap-hpdc2002/soap-hpdc2002.pdf"
}

@TechReport{ paper:RMI,
    institution = "Sun Microsystems",
    title = "{Java Remote Method Invocation - Distributed Computing for
Java}",
    year = 1997,
    author = "{Sun Microsystems}",
    note = "
http://java.sun.com/products/jdk/rmi/reference/whitepapers/javarmi.html"
}

@TechReport{ spec:DOM,
    institution = "W3C",
    author = "Vidur Apparao and others",
    year = "1998",
    title = "{Document Object Model (DOM)}",
    note = "http://www.w3.org/DOM/"
}


@TechReport{ ietf:rfc2616,
    title = "{RFC 2616: Hypertext Transfer Protocol -- HTTP/1.1}",
    author = "R. Fielding and J. Gettys and J. Mogul and H. Frysyk and L.
Masinter and P. Leach and T. Berners-Lee",
    institution = "IETF",
    note = "http://ietf.org/rfc/rfc2616.txt",
    year = "1999"
}

@Misc{ harold:xom,
    title = "{What's Wrong with XML APIs (and how to fix them)}",
    year = "2002",
    author = "Elliotte Rusty Harold",
    note = "http://www.cafeconleche.org/XOM/whatswrong/"
}

@Article{ parnas:interfaces,
    author = "David L. Parnas",
    title = "{Use of Abstract Interfaces in the Development of Software for
Embedded Computer Systems}",
    year = "1974"
}


@Book{ vinoski:CORBA,
    title = "{Advanced CORBA(R) Programming with C++}",
    year = "1999",
    author = "Michi Henning and Steve Vinoski",
    publisher = "Addison-Wesley"
}

@Book{ neward:EEJ,
    title = "{Effective Enterprise Java}",
    year = "2004",
    author = "Ted Neward",
    publisher = "Addison-Wesley"
}

@Misc{ pjr:NKonTSS,
    title = "{1060 NetKernel- A new Abstraction for Web-systems}",
    author = "Peter Rodgers",
    year = "2004",
    howpublished = "online",
    note = "http://theserverside.com/articles/content/NetKernel/article.html
"
}


@Misc{ MSFT:TransitionsInProgrammingModels,
    title = "{Transitions in Programming Models}",
    author = "Luca Cardelli",
    year = 2003,
    howpublished = "online presentation",
    url = "
http://research.microsoft.com/Users/luca/Slides/2003-11-13%20Transitions%20in%20Programming%20Models%20(Lisbon).pdf
",
    annote = "New University of Lisbon, November 13, 2003."
}

@Article{ gray:petascale,
    title = "Petascale Computational Systems: Balanced Cyber-Infrastructure
in a Data-Centric World",
    author = "Gordon Bell and Jim Gray and Alex Szalay",
    journal = "IEEE Computer",
    pages = "110--112",
    volume = "39",
    number = "1",
    month = "jan",
    year = "2006",
    url = "
http://research.microsoft.com/en-us/um/people/gray/papers/Petascale%20computational%20systems.pdf
"
}

@InProceedings{ Pinheiro07:failuretrends,
    author = "Eduardo Pinheiro and Wolf-dietrich Weber and Luiz Andr{\'e}
Barroso",
    title = "Failure trends in a large disk drive population",
    booktitle = "In Proceedings of the 5th USENIX Conference on File and
Storage Technologies",
    year = "2007"
}

@Misc{ Panzer:integrity,
    title = "Data integrity",
    author = "Bernd Panzer-Steindel",
    month = "April",
    year = "2007",
    organization = "CERN",
    url = "
http://indico.cern.ch/getFile.py/access?contribId=3&sessionId=0&resId=1&materialId=paper&confId=13797
"
}

@Unpublished{ raman11:samplingMR,
    title = "Extending Map-Reduce for Efficient Predicate-Based Sampling",
    author = "Raman Grover and Michael J Carey",
    year = "2011",
    note = "received 2011-08"
}

@Misc{ MR-2026,
    title = "MAPREDUCE-2026: JobTracker.getJobCounters() should not hold
JobTracker lock while calling JobInProgress.getCounters()",
    author = "Scott Chen and others",
    publisher = "ASF",
    organization = "Facebook",
    month = August,
    year = "2010",
    url = "https://issues.apache.org/jira/browse/MAPREDUCE-2026"
}

@Electronic{ radia11:integrity,
    title = "Data Integrity and Availability in Apache Hadoop HDFS",
    author = "Sanjay Radia",
    organization = "HortonWorks",
    month = "August",
    year = "2011",
    url = "
http://www.hortonworks.com/data-integrity-and-availability-in-apache-hadoop-hdfs/
"
}

@Article{ Schroeder:2010:ULS:1837915.1837917,
    author = "Bianca Schroeder and Sotirios Damouras and Phillipa Gill",
    title = "Understanding latent sector errors and how to protect against
them",
    journal = "Trans. Storage",
    volume = "6",
    issue = "3",
    month = "September",
    year = "2010",
    issn = "1553-3077",
    pages = "9:1--9:23",
    articleno = "9",
    numpages = "23",
    url = "http://doi.acm.org/10.1145/1837915.1837917",
    doi = "http://doi.acm.org/10.1145/1837915.1837917",
    acmid = "1837917",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "Latent sector errors, data loss, failure data, failure
modeling, field data, parity, redundancy, scrubbing, storage reliability"
}

@Article{ Elerath:2009:HDG:1516046.1516059,
    author = "Jon Elerath",
    title = "Hard-disk drives: the good, the bad, and the ugly",
    journal = "Commun. ACM",
    issue_date = "June 2009",
    volume = "52",
    issue = "6",
    month = "June",
    year = "2009",
    issn = "0001-0782",
    pages = "38--45",
    numpages = "8",
    url = "http://doi.acm.org/10.1145/1516046.1516059",
    doi = "http://doi.acm.org/10.1145/1516046.1516059",
    acmid = "1516059",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@InProceedings{ Nightingale:2011:CCP:1966445.1966477,
    author = "Edmund B. Nightingale and John R. Douceur and Vince Orgovan",
    title = "Cycles, cells and platters: an empirical analysis of hardware
failures on a million consumer PCs",
    booktitle = "Proceedings of the sixth conference on Computer systems",
    series = "EuroSys '11",
    year = "2011",
    isbn = "978-1-4503-0634-8",
    location = "Salzburg, Austria",
    pages = "343--356",
    numpages = "14",
    url = "http://doi.acm.org/10.1145/1966445.1966477",
    doi = "http://doi.acm.org/10.1145/1966445.1966477",
    acmid = "1966477",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "fault tolerance, reliability"
}

@Article{ Jiang:2008:DDC:1416944.1416946,
    author = "Weihang Jiang and Chongfeng Hu and Yuanyuan Zhou and Arkady
Kanevsky",
    title = "Are disks the dominant contributor for storage failures?: A
comprehensive study of storage subsystem failure characteristics",
    journal = "Trans. Storage",
    volume = "4",
    issue = "3",
    month = "November",
    year = "2008",
    issn = "1553-3077",
    pages = "7:1--7:25",
    articleno = "7",
    numpages = "25",
    url = "http://doi.acm.org/10.1145/1416944.1416946",
    doi = "http://doi.acm.org/10.1145/1416944.1416946",
    acmid = "1416946",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "Storage system, disk failures, failure characteristics,
storage subsystem"
}

@InProceedings{ warneke:nephele,
    author = "Daniel Warneke and Odej Kao",
    title = "{Nephele: Efficient Parallel Data Processing in the Cloud}",
    booktitle = "SC-MTAGS",
    year = "2009",
    bibsource = "DBLP, http://dblp.uni-trier.de",
    ee = "http://doi.acm.org/10.1145/1646468.1646476"
}

@InProceedings{ battre:nephelePACTs,
    author = "Dominic Battr{\'e} and Stephan Ewen and Fabian Hueske and
Odej Kao and Volker Markl and Daniel Warneke",
    title = "{Nephele/PACTs: A Programming Model and Execution Framework
for Web-Scale Analytical Processing}",
    booktitle = "SoCC '10: Proceedings of the ACM Symposium on Cloud
Computing 2010",
    year = "2010",
    pages = "119--130",
    address = "New York, NY, USA",
    publisher = "ACM",
    location = "Indianapolis, IN, USA"
}

@Article{ alexandrov:nephelePACTsDemo,
    author = "Alexander Alexandrov and Dominic Battr{\'e} and Stephan Ewen
and Max Heimel and Fabian Hueske and Odej Kao and Volker Markl and Erik
Nijkamp and Daniel Warneke",
    title = "Massively Parallel Data Analysis with PACTs on Nephele",
    journal = "PVLDB",
    volume = "3",
    number = "2",
    year = "2010",
    pages = "1625--1628"
}

@Misc{ asterix:website,
    key = "AST",
    title = "Asterix: A Highly Scalable Parallel Platform for
Semi-structured Data Management and Analysis",
    howpublished = "URL: http://asterix.ics.uci.edu",
    owner = "fhueske"
}

@Misc{ stratosphere:website,
    key = "STR",
    title = "{The Stratosphere Project}",
    howpublished = "URL: http://stratosphere.eu",
    owner = "fhueske"
}

@InProceedings{ Stone:2000:CTC:347059.347561,
    author = "Jonathan Stone and Craig Partridge",
    title = "When the CRC and TCP checksum disagree",
    booktitle = "Proceedings of the conference on Applications,
Technologies, Architectures, and Protocols for Computer Communication",
    series = "SIGCOMM '00",
    year = "2000",
    isbn = "1-58113-223-9",
    location = "Stockholm, Sweden",
    pages = "309--319",
    numpages = "11",
    url = "http://doi.acm.org/10.1145/347059.347561",
    doi = "http://doi.acm.org/10.1145/347059.347561",
    acmid = "347561",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@Book{ Hoelzle:2009:DCI:1643608,
    author = "Urs Hoelzle and Luiz Andr{\'e} Barroso",
    title = "The Datacenter as a Computer: An Introduction to the Design of
Warehouse-Scale Machines",
    year = "2009",
    isbn = "159829556X, 9781598295566",
    edition = "1st",
    publisher = "Morgan and Claypool Publishers"
}

@InProceedings{ Dean:2004:MSD:1251254.1251264,
    author = "Jeffrey Dean and Sanjay Ghemawat",
    title = "MapReduce: simplified data processing on large clusters",
    booktitle = "Proceedings of the 6th conference on Symposium on
Opearting Systems Design \& Implementation - Volume 6",
    year = "2004",
    location = "San Francisco, CA",
    pages = "10--10",
    numpages = "1",
    url = "http://dl.acm.org/citation.cfm?id=1251254.1251264",
    acmid = "1251264",
    publisher = "USENIX Association",
    address = "Berkeley, CA, USA"
}

@Article{ Valiant:1990:BMP:79173.79181,
    author = "Leslie G. Valiant",
    title = "A bridging model for parallel computation",
    journal = "Commun. ACM",
    volume = "33",
    issue = "8",
    month = "August",
    year = "1990",
    issn = "0001-0782",
    pages = "103--111",
    numpages = "9",
    url = "http://doi.acm.org/10.1145/79173.79181",
    doi = "http://doi.acm.org/10.1145/79173.79181",
    acmid = "79181",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@InProceedings{ Malewicz:2010:PSL:1807167.1807184,
    author = "Grzegorz Malewicz and Matthew H. Austern and Aart J.C Bik and
James C. Dehnert and Ilan Horn and Naty Leiser and Grzegorz Czajkowski",
    title = "Pregel: a system for large-scale graph processing",
    booktitle = "Proceedings of the 2010 international conference on
Management of data",
    series = "SIGMOD '10",
    year = "2010",
    isbn = "978-1-4503-0032-2",
    location = "Indianapolis, Indiana, USA",
    pages = "135--146",
    numpages = "12",
    url = "http://doi.acm.org/10.1145/1807167.1807184",
    doi = "http://doi.acm.org/10.1145/1807167.1807184",
    acmid = "1807184",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "distributed computing, graph algorigthms"
}

@InProceedings{ ZhangEtAl10-ZFSCorruption,
    title = "{End-to-end Data Integrity for File Systems: A ZFS Case
Study}",
    author = "{Yupu Zhang, Abhishek Rajimwale, Andrea C. Arpaci-Dusseau,
Remzi H. Arpaci-Dusseau}",
    booktitle = "Proceedings of the 8th Conference on File and Storage
Technologies (FAST '10)",
    month = "February",
    year = "2010",
    address = "San Jose, California"
}

@TechReport{ Slee2007:thrift,
    title = "Thrift: Scalable cross-language services implementation",
    author = "Mark Slee and Aditya Agarwal and Marc Kwiatkowski",
    institution = "Facebook",
    address = "156 University Ave, Palo Alto, CA ",
    month = apr,
    year = "2007",
    url = "http://thrift.apache.org/static/thrift-20070401.pdf"
}

@InProceedings{ Macambira:2010:MPP:1890799.1890806,
    author = "Tiago Alves Macambira and Dorgival Guedes",
    title = "A middleware for parallel processing of large graphs",
    booktitle = "Proceedings of the 8th International Workshop on
Middleware for Grids, Clouds and e-Science",
    series = "MGC '10",
    year = "2010",
    isbn = "978-1-4503-0453-5",
    location = "Bangalore, India",
    pages = "7:1--7:6",
    articleno = "7",
    numpages = "6",
    url = "http://doi.acm.org/10.1145/1890799.1890806",
    doi = "http://doi.acm.org/10.1145/1890799.1890806",
    acmid = "1890806",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@InProceedings{ 5260544,
    author = "D. Molka and D. Hackenberg and R. Schone and M.S. Muller",
    booktitle = "Parallel Architectures and Compilation Techniques, 2009.
PACT '09. 18th International Conference on",
    title = "Memory Performance and Cache Coherency Effects on an Intel
Nehalem Multiprocessor System",
    year = "2009",
    month = "sept.",
    volume = "",
    number = "",
    pages = "261--270",
    keywords = "Intel Nehalem microarchitecture, Intel Nehalem
multiprocessor system, cache coherency effects, cache coherency protocol,
ccNUMA architecture, integrated memory controller, memory hierarchy, memory
performance, memory subsystems, multicore processors, quick path
interconnect, cache storage, microprocessor chips",
    doi = "10.1109/PACT.2009.22",
    ISSN = "1089-795X"
}

@Article{ Schroeder:2007:UDF:1288783.1288785,
    author = "Bianca Schroeder and Garth A. Gibson",
    title = "Understanding disk failure rates: What does an MTTF of
1,000,000 hours mean to you?",
    journal = "Trans. Storage",
    volume = "3",
    issue = "3",
    month = "October",
    year = "2007",
    issn = "1553-3077",
    articleno = "8",
    url = "http://doi.acm.org/10.1145/1288783.1288785",
    doi = "http://doi.acm.org/10.1145/1288783.1288785",
    acmid = "1288785",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "Hard drive replacements, MTTF, annual failure rates, annual
replacement rates, datasheet MTTF, failure correlation, hard drive failure,
infant mortality, storage reliability, time between failure, wear-out"
}

@InProceedings{ Partridge:1995:PCC:217382.217413,
    author = "Craig Partridge and Jim Hughes and Jonathan Stone",
    title = "Performance of checksums and CRCs over real data",
    booktitle = "Proceedings of the conference on Applications,
technologies, architectures, and protocols for computer communication",
    series = "SIGCOMM '95",
    year = "1995",
    isbn = "0-89791-711-1",
    location = "Cambridge, Massachusetts, United States",
    pages = "68--76",
    numpages = "9",
    url = "http://doi.acm.org/10.1145/217382.217413",
    doi = "http://doi.acm.org/10.1145/217382.217413",
    acmid = "217413",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@InProceedings{ Schroeder:2009:DEW:1555349.1555372,
    author = "Bianca Schroeder and Eduardo Pinheiro and Wolf-Dietrich
Weber",
    title = "DRAM errors in the wild: a large-scale field study",
    booktitle = "Proceedings of the eleventh international joint conference
on Measurement and modeling of computer systems",
    series = "SIGMETRICS '09",
    year = "2009",
    isbn = "978-1-60558-511-6",
    location = "Seattle, WA, USA",
    pages = "193--204",
    numpages = "12",
    url = "http://doi.acm.org/10.1145/1555349.1555372",
    doi = "http://doi.acm.org/10.1145/1555349.1555372",
    acmid = "1555372",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "data corruption, dimm, dram, dram reliability, ecc,
empirical study, hard error, large-scale systems, memory, soft error"
}

@InProceedings{ Vishwanath:2010:CCC:1807128.1807161,
    author = "Kashi Venkatesh Vishwanath and Nachiappan Nagappan",
    title = "Characterizing cloud computing hardware reliability",
    booktitle = "Proceedings of the 1st ACM symposium on Cloud computing",
    series = "SoCC '10",
    year = "2010",
    isbn = "978-1-4503-0036-0",
    location = "Indianapolis, Indiana, USA",
    pages = "193--204",
    numpages = "12",
    url = "http://doi.acm.org/10.1145/1807128.1807161",
    doi = "http://doi.acm.org/10.1145/1807128.1807161",
    acmid = "1807161",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "datacenters, failures"
}

@InProceedings{ Ford:2010:AGD:1924943.1924948,
    author = "Daniel Ford and Fran\c{c}ois Labelle and Florentina I.
Popovici and Murray Stokely and Van-Anh Truong and Luiz Barroso and Carrie
Grimes and Sean Quinlan",
    title = "Availability in globally distributed storage systems",
    booktitle = "Proceedings of the 9th USENIX conference on Operating
systems design and implementation",
    series = "OSDI'10",
    year = "2010",
    location = "Vancouver, BC, Canada",
    pages = "1--7",
    numpages = "7",
    url = "http://dl.acm.org/citation.cfm?id=1924943.1924948",
    acmid = "1924948",
    publisher = "USENIX Association",
    address = "Berkeley, CA, USA"
}

@InProceedings{ Gill:2011:UNF:2018436.2018477,
    author = "Phillipa Gill and Navendu Jain and Nachiappan Nagappan",
    title = "Understanding network failures in data centers: measurement,
analysis, and implications",
    booktitle = "Proceedings of the ACM SIGCOMM 2011 conference on SIGCOMM",
    series = "SIGCOMM '11",
    year = "2011",
    isbn = "978-1-4503-0797-0",
    location = "Toronto, Ontario, Canada",
    pages = "350--361",
    numpages = "12",
    url = "http://doi.acm.org/10.1145/2018436.2018477",
    doi = "http://doi.acm.org/10.1145/2018436.2018477",
    acmid = "2018477",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "data centers, network reliability"
}

@InProceedings{ Burrows:2006:CLS:1298455.1298487,
    author = "Mike Burrows",
    title = "The Chubby lock service for loosely-coupled distributed
systems",
    booktitle = "Proceedings of the 7th symposium on Operating systems
design and implementation",
    series = "OSDI '06",
    year = "2006",
    isbn = "1-931971-47-1",
    location = "Seattle, Washington",
    pages = "335--350",
    numpages = "16",
    url = "http://dl.acm.org/citation.cfm?id=1298455.1298487",
    acmid = "1298487",
    publisher = "USENIX Association",
    address = "Berkeley, CA, USA"
}

@Article{ Lamport:1998:PP:279227.279229,
    author = "Leslie Lamport",
    title = "The part-time parliament",
    journal = "ACM Trans. Comput. Syst.",
    volume = "16",
    issue = "2",
    month = "May",
    year = "1998",
    issn = "0734-2071",
    pages = "133--169",
    numpages = "37",
    url = "http://doi.acm.org/10.1145/279227.279229",
    doi = "http://doi.acm.org/10.1145/279227.279229",
    acmid = "279229",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "state machines, three-phase commit, voting"
}

@Unpublished{ ananthanarayanan:locality,
    title = "Disk Locality in Datacenter Computing Considered Irrelevant",
    author = "Ganesh Ananthanarayanan and Ali Ghodsi and Scott Shenker and
Ion Stoica",
    year = "2011",
    url = "http://www.cs.berkeley.edu/~ganesha/disk-irrelevant_hotos2011.pd"
}

@InProceedings{ Schmuck02gpfs:GPFS,
    author = "Frank Schmuck and Roger Haskin",
    title = "GPFS: A Shared-Disk File System for Large Computing Clusters",
    booktitle = "In Proceedings of the 2002 Conference on File and Storage
Technologies (FAST",
    year = "2002",
    pages = "231--244"
}

@InProceedings{ Cutting:1992:SCA:133160.133214,
    author = "Douglass R. Cutting and David R. Karger and Jan O. Pedersen
and John W. Tukey",
    title = "Scatter/Gather: a cluster-based approach to browsing large
document collections",
    booktitle = "Proceedings of the 15th annual international ACM SIGIR
conference on Research and development in information retrieval",
    series = "SIGIR '92",
    year = "1992",
    isbn = "0-89791-523-2",
    location = "Copenhagen, Denmark",
    pages = "318--329",
    numpages = "12",
    url = "http://doi.acm.org/10.1145/133160.133214",
    doi = "http://doi.acm.org/10.1145/133160.133214",
    acmid = "133214",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@Article{ Bloom:1970:STH:362686.362692,
    author = "Burton H. Bloom",
    title = "Space/time trade-offs in hash coding with allowable errors",
    journal = "Commun. ACM",
    issue_date = "July 1970",
    volume = "13",
    issue = "7",
    month = "July",
    year = "1970",
    issn = "0001-0782",
    pages = "422--426",
    numpages = "5",
    url = "http://doi.acm.org/10.1145/362686.362692",
    doi = "http://doi.acm.org/10.1145/362686.362692",
    acmid = "362692",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "hash addressing, hash coding, retrieval efficiency,
retrieval trade-offs, scatter storage, searching, storage efficiency,
storage layout"
}

@Article{ Gelernter:1985:GCL:2363.2433,
    author = "David Gelernter",
    title = "Generative communication in Linda",
    journal = "ACM Trans. Program. Lang. Syst.",
    volume = "7",
    issue = "1",
    month = "January",
    year = "1985",
    issn = "0164-0925",
    pages = "80--112",
    numpages = "33",
    url = "http://doi.acm.org/10.1145/2363.2433",
    doi = "http://doi.acm.org/10.1145/2363.2433",
    acmid = "2433",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@InProceedings{ 2005:DSM:1078024.1078278,
    title = "A Distributed State Monitoring Service for Adaptive
Application Management",
    booktitle = "Proceedings of the 2005 International Conference on
Dependable Systems and Networks",
    series = "DSN '05",
    year = "2005",
    isbn = "0-7695-2282-3",
    pages = "200--205",
    numpages = "6",
    url = "http://dx.doi.org/10.1109/DSN.2005.6",
    doi = "http://dx.doi.org/10.1109/DSN.2005.6",
    acmid = "1078278",
    publisher = "IEEE Computer Society",
    address = "Washington, DC, USA"
}

@Article{ DeCandia:2007:DAH:1323293.1294281,
    author = "Giuseppe DeCandia and Deniz Hastorun and Madan Jampani and
Gunavardhan Kakulapati and Avinash Lakshman and Alex Pilchin and
Swaminathan Sivasubramanian and Peter Vosshall and Werner Vogels",
    title = "Dynamo: amazon's highly available key-value store",
    journal = "SIGOPS Oper. Syst. Rev.",
    issue_date = "December 2007",
    volume = "41",
    issue = "6",
    month = "October",
    year = "2007",
    issn = "0163-5980",
    pages = "205--220",
    numpages = "16",
    url = "http://doi.acm.org/10.1145/1323293.1294281",
    doi = "http://doi.acm.org/10.1145/1323293.1294281",
    acmid = "1294281",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "performance, reliability, scalability"
}

@InProceedings{ Kostakos06instrumentingthe,
    author = {Vassilis Kostakos and Tim Kindberg and Ava Fatah Gen Schiek
and Alan Penn and Dana{\"e} Stanton Fraser and Tim Jones},
    title = "Instrumenting the city: developing methods for observing and
understanding the digital cityscape",
    booktitle = "In Proc. of the 8th International Conference on Ubiquitous
Computing (UBICOMP",
    year = "2006",
    publisher = "Springer"
}

@InProceedings{ Adya:2011:TCN:2043556.2043570,
    author = "Atul Adya and Gregory Cooper and Daniel Myers and Michael
Piatek",
    title = "Thialfi: a client notification service for internet-scale
applications",
    booktitle = "Proceedings of the Twenty-Third ACM Symposium on Operating
Systems Principles",
    series = "SOSP '11",
    year = "2011",
    isbn = "978-1-4503-0977-6",
    location = "Cascais, Portugal",
    pages = "129--142",
    numpages = "14",
    url = "http://doi.acm.org/10.1145/2043556.2043570",
    doi = "http://doi.acm.org/10.1145/2043556.2043570",
    acmid = "2043570",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "distributed systems, scalability"
}

@InProceedings{ Hunt:2010:ZWC:1855840.1855851,
    author = "Patrick Hunt and Mahadev Konar and Flavio P. Junqueira and
Benjamin Reed",
    title = "ZooKeeper: wait-free coordination for internet-scale systems",
    booktitle = "Proceedings of the 2010 USENIX conference on USENIX annual
technical conference",
    series = "USENIXATC'10",
    year = "2010",
    location = "Boston, MA",
    pages = "11--11",
    numpages = "1",
    url = "http://www.usenix.org/event/atc10/tech/full_papers/Hunt.pdf",
    acmid = "1855851",
    publisher = "USENIX Association",
    address = "Berkeley, CA, USA"
}

@InProceedings{ Chen:2011:DIE:2043556.2043562,
    author = "Yanpei Chen and Kiran Srinivasan and Garth Goodson and Randy
Katz",
    title = "Design implications for enterprise storage systems via
multi-dimensional trace analysis",
    booktitle = "Proceedings of the Twenty-Third ACM Symposium on Operating
Systems Principles",
    series = "SOSP '11",
    year = "2011",
    isbn = "978-1-4503-0977-6",
    location = "Cascais, Portugal",
    pages = "43--56",
    numpages = "14",
    url = "http://doi.acm.org/10.1145/2043556.2043562",
    doi = "http://doi.acm.org/10.1145/2043556.2043562",
    acmid = "2043562",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@Article{ Chambers:2010:FEE:1809028.1806638,
    author = "Craig Chambers and Ashish Raniwala and Frances Perry and
Stephen Adams and Robert R. Henry and Robert Bradshaw and Nathan
Weizenbaum",
    title = "FlumeJava: easy, efficient data-parallel pipelines",
    journal = "SIGPLAN Not.",
    volume = "45",
    issue = "6",
    month = "June",
    year = "2010",
    issn = "0362-1340",
    pages = "363--375",
    numpages = "13",
    url = "http://doi.acm.org/10.1145/1809028.1806638",
    doi = "http://doi.acm.org/10.1145/1809028.1806638",
    acmid = "1806638",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "data-parallel programming, java, mapreduce"
}

@InProceedings{ Olston:2008:PLN:1376616.1376726,
    author = "Christopher Olston and Benjamin Reed and Utkarsh Srivastava
and Ravi Kumar and Andrew Tomkins",
    title = "Pig latin: a not-so-foreign language for data processing",
    booktitle = "Proceedings of the 2008 ACM SIGMOD international
conference on Management of data",
    series = "SIGMOD '08",
    year = "2008",
    isbn = "978-1-60558-102-6",
    location = "Vancouver, Canada",
    pages = "1099--1110",
    numpages = "12",
    url = "http://doi.acm.org/10.1145/1376616.1376726",
    doi = "http://doi.acm.org/10.1145/1376616.1376726",
    acmid = "1376726",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "dataflow language, pig latin"
}

@Article{ Lakshman:2010:CDS:1773912.1773922,
    author = "Avinash Lakshman and Prashant Malik",
    title = "Cassandra: a decentralized structured storage system",
    journal = "SIGOPS Oper. Syst. Rev.",
    volume = "44",
    issue = "2",
    month = "April",
    year = "2010",
    issn = "0163-5980",
    pages = "35--40",
    numpages = "6",
    url = "http://doi.acm.org/10.1145/1773912.1773922",
    doi = "http://doi.acm.org/10.1145/1773912.1773922",
    acmid = "1773922",
    publisher = "ACM",
    address = "New York, NY, USA"
}

@Article{ DeCandia:2007:DAH:1323293.1294281,
    author = "Giuseppe DeCandia and Deniz Hastorun and Madan Jampani and
Gunavardhan Kakulapati and Avinash Lakshman and Alex Pilchin and
Swaminathan Sivasubramanian and Peter Vosshall and Werner Vogels",
    title = "Dynamo: amazon's highly available key-value store",
    journal = "SIGOPS Oper. Syst. Rev.",
    issue_date = "December 2007",
    volume = "41",
    issue = "6",
    month = oct,
    year = "2007",
    issn = "0163-5980",
    pages = "205--220",
    numpages = "16",
    url = "http://doi.acm.org/10.1145/1323293.1294281",
    doi = "http://doi.acm.org/10.1145/1323293.1294281",
    acmid = "1294281",
    publisher = "ACM",
    address = "New York, NY, USA",
    keywords = "performance, reliability, scalability"
}

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message