ITCS 4010/5010: Cloud Computing for Data Analysis

Course Information

This seminar course will introduce the basic principles of cloud computing for data-intensive applications. It will focus on parallel computing using Google's MapReduce paradigm on Linux clusters, and algorithms for large-scale data processing applications in web search, information retrieval, computational advertising, and scientific data analysis. Students will present research papers on these topics, and implement programming projects using Hadoop, an open source implementation of Google's MapReduce technology.

Prerequisites: Familiarity with Java, Linux, Linear algebra, Probability, Statistics.

Syllabus

Syllabus (pdf)

Papers

Homeworks

Programming Assignments

Final Course Project

Class Presentations

Cloud computing in the News

Hadoop resources

Acknowledgments

This course has greatly benefited from course material generously provided by Ed Lazowska, Aaron Kimball, Jimmy Lin, Kamal Nigam, Chris Manning, Prabhakar Raghavan, and Hinrich Schutze. Links to their courses are below.
Last modified: Mon Apr 06 12:19:53 Eastern Daylight Time 2009