ITCS 6190-001/8190-001: Cloud Computing for Data Analysis
Fall 2014

Course Information

Have you ever wondered how a search engine works? How online ads are customized for viewers? How Amazon recommends books for you? If you are interested in these and similar questions, then this is the course for you!

This course will introduce the basic principles of cloud computing for data-intensive applications. It will focus on parallel computing using Google's MapReduce paradigm on Linux clusters, and algorithms for large-scale data processing applications in web search, information retrieval, computational advertising, and scientific data analysis. Students will read and present research papers on these topics, and implement programming projects using Hadoop, an open source implementation of Google's MapReduce technology.

Prerequisites: Familiarity with Java, Linux, Algorithms and Data structures, Linear algebra, Probability, Statistics. Students are expected to have good programming skills including knowledge of data structures and algorithms, and a solid mathematical background.

Syllabus

Syllabus (pdf)

Papers

Homeworks

Programming Assignments

Course Project

Books

Acknowledgments

This course has greatly benefited from course material generously provided by Ed Lazowska, Aaron Kimball, Jimmy Lin, Kamal Nigam, Chris Manning, Prabhakar Raghavan, and Hinrich Schutze. Links to their courses are below.
Last modified: Sat Apr 5 15:34:18 EDT 2014