Knowledge Discovery in Databases - KDD

Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook: "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch, Vipin Kumar, Addison Wesley.

Course Outline


Association Rules (PPT format)

Association Rules (Video prepared by Laurel Powell)

Classification Trees(PPT format)

Classification Trees(Video prepared by Laurel Powell)

Rough Sets (PPT format)

Reducts and Discretization (PPT format)

Reducts(Video prepared by Laurel Powell)

Discretization(Video prepared by Laurel Powell)

LERS/ERID (PPT format)

Mining Incomplete Data (PPT format)

Action Rules (PPT format)

Sample Problems (Midterm Exam) (WORD format)

Midterm Exam (WORD format)

Clustering Methods (PPT format)

TV Trees (PPT format)

Clustering I (Textbook)(PPT format)

Clustering I(Video prepared by Laurel Powell)

Clustering II (Textbook)(PPT format)

Clustering - Sample problems (WORD format)

Evaluation Methods (PPT format)

Anomaly Detection (Textbook)(PPT format)

Temps DB Mining (PPT format)

Patient Safety (PPT Format)

Mining Music Database (PPT format)

Net Promoter Score (PPT format)

Sample Problems (WORD format)

Solutions (WORD format)

Sample Problems II(WORD format)

Midterm Exam with Solutions(PDF format)


Sample Problems (Final Exam) (WORD format)

Study Report is required for ADKD Certificate Program students. It should be submitted to "" anytime before the Final exam.

Project (maximum 4 students in a group):

Here is the link to Fragile States Data: Take the data covering 4 consecutive years and extend them by adding minimum 6 new features of your choice. Do not merge these datasets. Use information from the WEB to find values for these new features. Use feature TOTAL as the decision feature with discretization replacing numbers by concepts: Alert (union of Very High Alert, High Alert, Alert), Warning (union of High Warning, Elevated Warning, Warning), Stable (union of Stable, More Stable, Very Stable), Sustainable (union of Sustainable, Very Sustainable). Using WEKA, find the best classifier for your extended datasets (extended by minimum 6 new features). Your classifier should have reasonably high precission. Find action rules showing what actions can be recommended to countries labeled as Alert to move them to a less fragile class. Check how these actions are changing year by year. If your classifiers precision is low, you have to extend your datasets by different attributes.
Submit your project with proper documentation (user manual) to Kasia Tarnowska [] one week before the last class meeting.

Rubrics will be used for grading the Study Report and the Project: Rubrics

Midterm: October 12
Final: 2:00pm-4:30pm, December 14 (Thursday), 2017
Points for MS/PhD students: 30 points Test, 30 points Final, 30 points Project, 10 points Attendance
Points for ADKD students (not in MS/PhD Program): 30 points Test, 30 points Final, 30 points Project + Study Report, 10 points Attendance
Grades: A [90-100], B [80-89], C [65-79]

Instructor:       Zbigniew W. Ras

Location: Woodward Hall 430C
Telephone: 704-687-8574
Office Hours: Thursday: 11:00am-1:00pm

GTA I:       Yuehua Duan Office: Location: KDD Lab. (Woodward Hall 402)
Telephone: 704-687-8546
Office Hours: Wednesday, 9:00am-noon, in Woodward 230

GTA II:       Laurel Powell Office: Location: KDD Lab. (Woodward Hall 402)
Telephone: 704-687-8546
Office Hours: NONE

GTA III:       Katarzyna Tarnowska Office: Location: KDD Lab. (Woodward Hall 402)
Telephone: 704-687-8546
Office Hours: NONE

Lisp Miner(by Jan Rauch)

Rough Set Exploration System (RSES)

Bratko's ORANGE

Random Forests



LERS - Version for PC (Manual) and LERS System (software)

More software for data mining

Repository of large datasets

Medical Data

GMU KDD Software

LERS vs ERID (WORD format)

Extracting Rules from Incomplete Table (WORD format)

Lance & Williams Distance (WORD format)