ICS 491 Special Topics: Big Data Analytics
Lipyeow Lim
lipyeow@hawaii.edu
This course covers concepts required for mining massive data sets with a focus on the practical application of the concepts, tools and techniques in real-world data mining situations. The course teaches the student everything (s)he need to know to get going, from selecting the appropriate big data platforms, preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.
To teach students:
At the end of this course the successfull student should
3.
ICS 321 Data Storage & Retrieval
Data Mining and Analysis: Fundamental Concepts and Algorithms. Mohammed J. Zaki, Wagner Meira, Jr. Cambridge University Press ISBN-13: 978-0521766333
Data Mining: Practical Machine Learning Tools and Techniques. 4th Edition. Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal. Morgan Kaufmann Series in Data Management Systems ISBN-13: 978-0128042915 ISBN-10: 0128042915
Weekly Review Writing (40%, of which 20% is for technical content, 20% for writing)
Project including report writing (60%, of which 20% is for writing)(Hallmark 1)
Writing is an essential part of the course and constitutes 40% of the course grade. (Hallmark 3)
There will be eight weekly review writing assignments. These review writing assignments will begin approximately in Week 3 after the student has been introduced to basic concepts required in the course in the first two weeks. Each review must be at least one page in length and the review is intended for the student to write about how the data mining concept covered that week applies to a specific real world problem that (s)he has chosen to focus on for the eight review writing assignments. The instructor and peers will read the reviews and provide feedback and comments on the writing (Hallmark 2). Google Docs (with sharing enabled) will be used for the reviews and the instructor & peer comments can be added to electronic document using the comment functionality in Google Docs. The student is required to revise the review based on the comments. The eight review writing assignments will constitute at least eight pages of writing (Hallmark 4).
The course project requires the student to select a real world problem, apply big data processing and data mining techniques to solve the problem, implement the solution, and empirically evaluate the solution (s)he implemented. The student can choose the same problem as the problem used in the review writing assignments, or the student can choose a different problem. The project will be graded based on a 10-minute oral presentation and an 8-page written report (Hallmark 4). A draft of the report must be submitted by week 15. After the instructor has provided feedback on the draft, the student must submit a final report by the project due date (Hallmark 2).
Standard classroom policies of the College of Natural Science apply. No other special policies apply
Late policy: work submitted past due date and time will receive zero credits.
Examinations: No make-up exams will be given.
Student Conduct: All students are expected to conduct themselves above and beyond the standard set forth in UH Systemwide Student Conduct Code.
Disability: Any student who feels s/he may need an accommodation based on the impact of a disability is invited to contact the instructor privately. The instructor would be happy to work with you, and the KOKUA Program (Office for Students with Disabilities) to ensure reasonable accommodations in the course. KOKUA can be reached at (808) 956-7511 or (808) 956-7612 (voice/text) in room 013 of the Queen Liliuokalani Center for Student Services.
Week 1: Overview of data mining work flow. Linear regression. Nearest Neighbor.
Week 2: Data Preparation & Analysis: Numeric, Categorical and other data types.
Week 3: Frequent Itemset Mining. Review 1 due.
Week 4: Clustering: k-means & hierarchical.Review 2 due.
Week 5: Classification: Probabilistic models. Review 3 due.
Week 6: Overview of big data concepts. Hadoop & Spark platforms. Review 4 due.
Week 7: Graph Pattern Mining. Review 5 due.
Week 8: Spectral & Graph Clustering. Review 6 due.
Week 9: Classification: Decision Trees & forests.
Week 10: Classification: Support Vector Machines. Review 7 due.
Week 11: Classification: Neural Networks. Review 8 due.
Week 12: Deep learning.
Week 13: Data reduction techniques.
Week 14: Ensemble learning.
Week 15: Presentation of Course Projects. Draft of project report due.
Week 16: Presentation of Course Projects. Final project report due.