ICS491 is a special topics course covering the concepts and skills required for mining massive data sets with a focus on the practical application of the concepts, tools and techniques in real-world data mining situations. The course teaches the student everything (s)he need to know to get going, from selecting the appropriate big data platforms, preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. This is a writing intensive course. For more information, please consult the syllabus.
Instructor: Lipyeow Lim. POST 303E. Wed 10:30AM - 12:30PM or by appointment. 808-956-3495. lipyeow at hawaii dot edu.
Examinations: There is no written final exam, but there will be a course project.
Textbooks: Data Mining and Analysis: Fundamental Concepts and Algorithms. Mohammed J. Zaki, Wagner Meira, Jr. Cambridge University Press ISBN-13: 978-0521766333
Communications: We will be using Slack for communications (falling back on email where necessary). Please post questions there so that the whole class can benefit.
Remote Students: Skype audio & video link will be available. Classes will be recorded and posted to youtube.
Late policy: work submitted past due date and time will receive zero credits.
Student Conduct: All students are expected to conduct themselves above and beyond the standard set forth in UH Systemwide Student Conduct Code.
Disability: Any student who feels s/he may need an accommodation based on the impact of a disability is invited to contact the instructor privately. The instructor would be happy to work with you, and the KOKUA Program (Office for Students with Disabilities) to ensure reasonable accommodations in the course. KOKUA can be reached at (808) 956-7511 or (808) 956-7612 (voice/text) in room 013 of the Queen Liliuokalani Center for Student Services.
Schedule
Week | Date | Topic | Before Class | In Class | After Class |
---|---|---|---|---|---|
1 | Mon Aug 21 | Introduction | Slides | Syllabus | Install Python & Libraries | |
1 | Wed Aug 23 | Introduction - data acquisition | Ch.1.{1-2} | Python Intro | video | HW1 |
2 | Mon Aug 28 | Introduction - linear regression | Ch.1.{3-5} | Python Intro | video |
2 | Wed Aug 30 | Introduction - nearest neighbor | Ch.1 | Slides | video |
3 | Mon Sep 4 | Labor Day Holiday | HW2 | HW1 due | ||
3 | Wed Sep 6 | No F2F Class. CLUSTER Conference. Watch video. Do HW2. | video | ||
4 | Mon Sep 11 | Thinking about Data | Ch.2-3 | Slides | Ex1. Analyzing Numeric Data | video | HW3 | HW2 due |
4 | Wed Sep 13 | Frequent Itemset Analysis | Ch.8 | Slides | Ex2. Analyzing Co-occuring Events | video |
5 | Mon Sep 18 | Frequent Itemset Analysis - FP | Ch.8 | Slides | video | HW3 due Sep 19 |
5 | Wed Sep 20 | Collaborative Filtering | Ex3. Analyzing Movie Ratings Data | video | HW4 | |
6 | Mon Sep 25 | Collaborative Filtering - Alternating Least Squares | ICDM 2008 paper | video | |
6 | Wed Sep 27 | Cluster Analysis - kmeans | Ch.13 | Slides | Ex4. Analyzing Clusters | video | HW5 |
7 | Mon Oct 2 | Cluster Analysis - hierachical | Ch.14 | video | HW4 due. | |
7 | Wed Oct 4 | No F2F class. FutureFocus Conference. Learn about Docker Containers. | A Beginner-Friendly Introduction to Containers, VMs and Docker | Ex5. Containers | HW6 |
8 | Mon Oct 9 | Big Data Platforms - hadoop | Slides | video | |
8 | Wed Oct 11 | Big Data Platforms - hadoop | Run the Hadoop Standalone example | Ex6. Hadoop | video | HW7 | HW5 due on Fri Oct 13. |
9 | Mon Oct 16 | Big Data Platforms - spark | Slides | video | Start thinking about project | HW8 | HW6 due. | |
9 | Wed Oct 18 | Cluster Analysis - density based | Ch.15 | Slides | Project | video |
10 | Mon Oct 23 | Probabilistic Classification - Naive Bayes, Bayesian Networks | Ch 18 | Slides | video |
10 | Wed Oct 25 | Probabilistic Classification - revisit EM, LDA | Ch 13.3 | Latent Dirichlet Allocation | Slides | video | Project Proposal due. |
11 | Mon Oct 30 | Decision Trees & Forests | Ch 19 | Slides | video |
11 | Wed Nov 1 | Data Science @ Booz Allen Hamilton Talk | video | HW7 due | ||
12 | Mon Nov 6 | Dimensionality Reduction | Ch 6-7 | Slides | Slides | video | HW8 due. |
12 | Wed Nov 8 | Linear Discriminant Analysis| SVM| Logistic Regression | Ch 20-21 | Logistic Regression | Slides | Slides | Slides | video |
13 | Mon Nov 13 | Feature Engineering - tf.idf | Slides | Slides | video | |
13 | Wed Nov 15 | Deep Learning| Feature Engineering - neural networks, skip gram, CBOW | But what is a neural network? | Gradient Descent: how neural networks learn | What is backpropagation? | Neural Networks | video |
14 | Mon Nov 20 | Feature Engineering| Deep Learning - word2vec,audio data,RNN,LSTM,autoencoders | Slides | video | |
14 | Wed Nov 22 | Deep Learning - CNN for image data (by Jonas Krause) | Slides | video | |
15 | Mon Nov 27 | The Dark Side | Weapons of Math Destruction Ch.0 | Weapons of Math Destruction Ch.1 | Discussion on WMD | |
15 | Wed Nov 29 | Visual Analytics (Aberto) | Slides | video | |
16 | Mon Dec 4 | Project | Kyle | Stephanie | Hailing | Ed | ||
16 | Wed Dec 6 | Project | Ayush | Mano | Eric | Ling-chih | Wyatt |
About this site: Modules lists the topics covered. Learning outcomes collect all the desired student learning outcomes of all the modules. Readings list the “passive” learning opportunities like reviewing of textbook sections, web pages, screencasts, etc. Experiences list the “active” learning opportunities where you must actually demonstrate a capability.