ITCS/ITIS 6162/8162 (asynchronous)

Knowledge Discovery in Databases - KDD
Spring 2024


Prerequisites: ITCS6160, full graduate standing or content of the department.
Textbook (not required): "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbauch, Vipin Kumar, Addison Wesley.


Course Syllabus



Office Hours (January 15 - March 1, March 11 - May 2)

If you have questions concerning any topic covered in PPT Presentations/Video Lectures/Sample Problems posted on this website, please join me or my TAs during our office hours scheduled either on ZOOM or in the KDD Lab (Woodward 402) every week.

All ZOOM sessions and meetings in the KDD Lab are listed below:

GTA 1
Sreeja Chevula (e-mail: schevula@uncc.edu)
- Office Hours on ZOOM
Monday: 11:00am-1:00pm; Tuesday (every second week beginning Jan 16)
ZOOM LINK: https://charlotte-edu.zoom.us/j/94603059545
- Office Hours in the KDD Lab (Woodward 402)
Tuesday (every second week beginning Jan 23) 2:00-4:00pm
No office hours on March 4-8 (Spring Recess)

GTA 2
Spandana Dirisala (e-mail: sdirisal@charlotte.edu)
- Office Hours on ZOOM
Wednesday: 2:00-4:00pm
ZOOM LINK: https://charlotte-edu.zoom.us/my/sdirisal
- Office Hours in the KDD Lab (Woodward 402)
Friday: 11:00am -1:00pm
No office hours on March 4-8 (Spring Recess)

GTA 3
Maneesha Reddy Komirelly (e-mail: mkomirel@charlotte.edu)
- Office Hours

From Jan 15 to Jan 31:
Monday - 9am to 11am (online)
Thursday - 9am to 11am (online)
Zoom Link:
https://charlotte-edu.zoom.us/j/94657279937?pwd=V2tRcmgrd1lTL1lxZnZwVzFER0J4QT09
Meeting ID: 946 5727 9937
Passcode: 607449

From Feb 1 to May 2:
Monday - 2pm to 4pm (even weeks - offline in KDD lab, odd weeks - online)
Zoom link:
https://charlotte-edu.zoom.us/j/93083906922?pwd=bzBBeU1RQVJWMFVSSE5YYW9BMU5vdz09
Meeting ID: 930 8390 6922
Passcode: 700598

Thursday - 9am to 11am (online)
Zoom link:
https://charlotte-edu.zoom.us/j/95952491196?pwd=OHhsOUNVQzFjcHIvdVMvRkVsclREdz09
Meeting ID: 959 5249 1196
Passcode: 145698

No office hours on March 4-8 (Spring Recess)

Zbigniew Ras
- Office Hours ZOOM Link
https://charlotte-edu.zoom.us/j/93755841946
Thursday: 3:00-5:00pm
(If no one shows up by 3:30pm, I will leave the zoom meeting)
No office hours on March 4-8 (Spring Recess)



Week 1 & 2 (January 10 - 20)
Learning objectives: Classification tree construction using entropy and Gini Index (see [2],[3]), association and representative rules discovery (see [4],[5]), classification rules discovery usung LERS (see [6]), computing reducts (using discernibility matrix or heuristic strategy based on attribute selection technique), data discretization, classification rules construction using discernibility functions for dataset objects (see [7],[8],[9],[10]).
[1] Data Preprocessing
[2] Classification Trees, PDF
[3] Classification Trees(Video by L. Powell)
[4] Association Rules, PDF, Video Lecture Part I, Video Lecture Part II
[5] Association Rules (Video by L. Powell)
[6] LERS, PDF
[7] Granular Computing, PDF, Video Lecture
[8] Reducts and Discretization, PDF
[9] Reducts(Video by L. Powell)
[10] Discretization(Video by L. Powell)

Week 3 & 4 (January 22 - February 3)
Learning objectives: Get familiar with problems and their solutions presented in [1]. If a problem is not entirely solved, complete the solution. Rules discovery from incomplete datasets using tolerance relation (see [3]) and SVM strategy (see [4]). Get familiar with minimum 2 software packages, RSES (see [2]), Orange or WEKA (see [5]).
[1] Sample Problems
[2] Rough Set Exploration System (RSES) , RSES, RS Manual
[3] Mining Incomplete Data PDF, Video Lecture
[4] Support Vector Machine PDF
[5] Bratko's ORANGE & WEKA

Week 5 & 6 (February 5 - February 17)
Learning objectives: Action rules construction methods DEAR 1, DEAR 2 (see [1]) and strategy based on action reducts (see [2]). Agglomerative and divisive clustering strategies (see [3],[4],[5],[6]). Get familiar with problems and their solutions presented in [7],[8]. If a problem is not entirely solved, complete the solution.
[1] Action Rules and Meta-Actions PDF, Video Lecture
[2] Action Rules Extraction Using Action Reducts
[3] Clustering Methods PDF, Video Lecture
[4] Clustering(Video)
[5] Clustering - Sample problems with solutions
[6] Clustering - Sample problems
[7] Sample Problems
[8] Sample Problems for Midterm Exam

Week 7 & 8 (February 19 - March 2)
Learning objectives: Strategy Chase for revealing hidden values in datasets (see [1]), TV-tree construction, and construction of new attributes when data sources are distributed (e.g. edge computing). Midterm Exam.
[1] Chase Algorithms PDF, Video Lecture
[2] TV Trees PDF
[3] Query Answering & New Attributes

Midterm
(on Canvas): February 29 (Thursday), 6:00-8:30pm

Week 9 & 10 (March 4 - 16)
Learning objectives: Class group project assignment (see [1]) and software package for action rules discovery called LispMiner (see [2]) which you need to learn in order to complete the project.
[1]Project
[2]Lisp Miner, Video Lecture (by Sapna Pareek)

Week 11 & 12 (March 18 - 30)
Learning objectives: Data sanitization method against chase (see [1]), classifiers evaluation strategies (see [2]), mining distributed data and big data (see [3]). Get familiar with problems and their solutions presented in [4]. If a problem is not entirely solved, finish the solution.
[1] Data Sanitization PDF Example
[2] Evaluation Methods
[3] Distributed Data and Big Data
[4] Sample Problems

Week 13 & 14 (April 1 - 13)
Learning objectives: Applying KDD methods to fine art evaluation (see [1]) and to improve human health (see [2]).
[1] Art Analytics (Paintings)
      VIDEO (by L. Powell)
[2] Health Analytics    Procedure Graph
      VIDEO (by Zbig Ras)

Week 15 & 16 (April 15 - 27)
Learning objectives: Review sample problems presented in [1]. Four of them will be on the final exam. Continue working on the class project so it will be submitted on time.
[1] Sample Problems (Final Exam)


FINAL EXAM on CANVAS

May 3 (Friday), 5:00 - 7:30pm


Final Exam Solutions


Project
Project and LISp-Miner
Upload the project report and the dataset you created to Canvas or email them to
Sreeja Chevula [schevula@uncc.edu], Spandana Dirisala [sdirisal@charlotte.edu], and Maneesha Reddy Komirelly [mkomirel@charlotte.edu]
not later than May 6 (Monday), 2024
Project Rubric (to be used for grading)


Midterm (on Canvas): February 29 (Thursday), 6:00-8:30pm
Final (on Canvas): May 3 (Friday), 5:00 - 7:30pm

Points: Midterm - 30 points, Final - 30 points, Project - 40 points
Grades: A [90-100], B [80-89], C [65-79].


Instructor:       Zbigniew W. Ras
Office: Woodward Hall 430C
Telephone: 704-687-8574
e-mail: ras@uncc.edu
Office Hours on ZOOM: Link
https://charlotte-edu.zoom.us/j/93755841946
Thursday: 3:00-5:00pm
(If nobody shows up by 3:30pm, I will leave the zoom meeting)


GTA 1:       Sreeja Chevula

Office: Woodward Hall 402 (KDD Lab)
e-mail: schevula@charlotte.edu


GTA 2:       Spandana Dirisala

Office: Woodward Hall 402 (KDD Lab)
e-mail: sdirisal@charlotte.edu


GTA 3:       Maneesha Reddy Komirelly

Office: Woodward Hall 402 (KDD Lab)
e-mail: mkomirel@charlotte.edu


Additional Documents

[1] KDD Software

[2] Lisp Miner (+ Action Rules Discovery Module)(by Jan Rauch)

[3] Lisp Miner Manual(by Jan Rauch's Student)

[4] SCARI: Action Rules Discovery Package

[5] Rough Sets

[6] Repository of large datasets

[7] LERS vs ERID

[8] Extracting Rules from Incomplete Table

[9] Lance & Williams Distance

[10] Sample Problems for Midterm Exam