ITCS 4010/5010: Cloud Computing for Data Analysis
Course Information
- Instructor: Srinivas Akella, sakella@uncc.edu
- Teaching Assistant: Min Sun, msun@uncc.edu
- Classroom: Woodward Hall 130
- Wed 3:30pm-6:15pm,
3 credits
- Office hours: Wed 10-11am, Woodward 410E
This seminar course will introduce the basic principles of cloud
computing for data-intensive applications. It will focus on parallel
computing using Google's MapReduce paradigm on Linux clusters, and
algorithms for large-scale data processing applications in web search,
information retrieval, computational advertising, and scientific data
analysis. Students will present research papers on these topics, and
implement programming projects using Hadoop, an open source
implementation of Google's MapReduce technology.
Prerequisites:
Familiarity with Java, Linux, Linear algebra, Probability, Statistics.
Syllabus
Syllabus (pdf)
Papers
- IR book: Introduction to Information Retrieval
- Optional book: Hadoop: The Definitive Guide (Rough Cuts version)
- MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and
Sanjay Ghemawat, OSDI'04: Sixth Symposium on Operating System Design
and Implementation, 2004.
- MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and
Sanjay Ghemawat, Communications of the ACM, Vol. 51, No. 1, page 107-113, January 2008.
- The Google File System, Sanjay Ghemawat, Howard Gobioff, Shun-Tak
Leung, Proceedings of the 19th ACM Symposium on Operating Systems
Principles (SOSP 2003),
Lake George, NY, October, 2003.
- Building Nutch: Open Source Search, Mike Cafarella and Doug Cutting, April 2004, ACM Queue.
- Web Search for a Planet: The Google Cluster Architecture
, Luiz Barroso, Jeffrey Dean, and Urs Hoelzle, March-April 2003, IEEE Micro.
- Information Retrieval and Web Search, Amy N. Langville and Carl D. Meyer. The Handbook of Linear Algebra. CRC Press, 2006.
- The Anatomy of a Large-Scale Hypertextual Web Search Engine, Sergey Brin, Lawrence Page. WWW 1998.
- Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching. In Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2000). Andrew McCallum, Kamal Nigam and Lyle Ungar. 2000.
- Map-Reduce for Machine Learning on Multicore. Cheng-Tao Chu, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng and Kunle Olukotun. In NIPS 19, 2007.
Homeworks
Programming Assignments
Final Course Project
Class Presentations
Cloud computing in the News
Hadoop resources
Acknowledgments
This course has greatly benefited from course material generously provided by Ed Lazowska, Aaron Kimball, Jimmy Lin, Kamal Nigam, Chris Manning, Prabhakar Raghavan, and Hinrich Schutze. Links to their courses are below.
Last modified: Mon Apr 06 12:19:53 Eastern Daylight Time 2009