Course syllabus for ICS491 Fall 2017

a. Course alpha and number, and course title.

ICS 491 Special Topics: Big Data Analytics

b. Instructor name and contact information.

Lipyeow Lim

lipyeow@hawaii.edu

c. Course description.

This course covers concepts required for mining massive data sets with a focus on the practical application of the concepts, tools and techniques in real-world data mining situations. The course teaches the student everything (s)he need to know to get going, from selecting the appropriate big data platforms, preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches.

d. Course objectives.

To teach students:

  1. how to conceptualize and design data analytics applications.
  2. how to use big data processing platforms and how they work.
  3. how to use data mining techniques and how they work.
  4. how to prepare and preprocess data for analytical processing.
  5. how to evaluate different the suitability and performance of data mining techniques.
  6. how to write clearly, professionally and effectively

e. Student Learning Outcomes

At the end of this course the successfull student should

  1. be able to analyze a problem to determine whether and how data mining techniques can be applied
  2. be able to analyze a problem to determine whether and how big data techniques can be applied
  3. understand basic data mining techniques
  4. know how to adapt and extend data mining techniques to massive data sets.
  5. know how to evaluate different data mining solutions
  6. be able to write clearly, professionally and effectively their data mining solutions to practical problems.

f. Number of credit hours

3.

g. Prerequisites

ICS 321 Data Storage & Retrieval

h. Textbooks, required readings

  1. Data Mining and Analysis: Fundamental Concepts and Algorithms. Mohammed J. Zaki, Wagner Meira, Jr. Cambridge University Press ISBN-13: 978-0521766333

  2. Data Mining: Practical Machine Learning Tools and Techniques. 4th Edition. Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher J. Pal. Morgan Kaufmann Series in Data Management Systems ISBN-13: 978-0128042915 ISBN-10: 0128042915

i. Grading and Student Evaluation

Writing is an essential part of the course and constitutes 40% of the course grade. (Hallmark 3)

There will be eight weekly review writing assignments. These review writing assignments will begin approximately in Week 3 after the student has been introduced to basic concepts required in the course in the first two weeks. Each review must be at least one page in length and the review is intended for the student to write about how the data mining concept covered that week applies to a specific real world problem that (s)he has chosen to focus on for the eight review writing assignments. The instructor and peers will read the reviews and provide feedback and comments on the writing (Hallmark 2). Google Docs (with sharing enabled) will be used for the reviews and the instructor & peer comments can be added to electronic document using the comment functionality in Google Docs. The student is required to revise the review based on the comments. The eight review writing assignments will constitute at least eight pages of writing (Hallmark 4).

The course project requires the student to select a real world problem, apply big data processing and data mining techniques to solve the problem, implement the solution, and empirically evaluate the solution (s)he implemented. The student can choose the same problem as the problem used in the review writing assignments, or the student can choose a different problem. The project will be graded based on a 10-minute oral presentation and an 8-page written report (Hallmark 4). A draft of the report must be submitted by week 15. After the instructor has provided feedback on the draft, the student must submit a final report by the project due date (Hallmark 2).

j. Classroom policies

Standard classroom policies of the College of Natural Science apply. No other special policies apply

Late policy: work submitted past due date and time will receive zero credits.

Examinations: No make-up exams will be given.

Student Conduct: All students are expected to conduct themselves above and beyond the standard set forth in UH Systemwide Student Conduct Code.

Disability: Any student who feels s/he may need an accommodation based on the impact of a disability is invited to contact the instructor privately. The instructor would be happy to work with you, and the KOKUA Program (Office for Students with Disabilities) to ensure reasonable accommodations in the course. KOKUA can be reached at (808) 956-7511 or (808) 956-7612 (voice/text) in room 013 of the Queen Liliuokalani Center for Student Services.

k. Weekly schedule of topics and readings, including exam dates.

Week 1: Overview of data mining work flow. Linear regression. Nearest Neighbor.

Week 2: Data Preparation & Analysis: Numeric, Categorical and other data types.

Week 3: Frequent Itemset Mining. Review 1 due.

Week 4: Clustering: k-means & hierarchical.Review 2 due.

Week 5: Classification: Probabilistic models. Review 3 due.

Week 6: Overview of big data concepts. Hadoop & Spark platforms. Review 4 due.

Week 7: Graph Pattern Mining. Review 5 due.

Week 8: Spectral & Graph Clustering. Review 6 due.

Week 9: Classification: Decision Trees & forests.

Week 10: Classification: Support Vector Machines. Review 7 due.

Week 11: Classification: Neural Networks. Review 8 due.

Week 12: Deep learning.

Week 13: Data reduction techniques.

Week 14: Ensemble learning.

Week 15: Presentation of Course Projects. Draft of project report due.

Week 16: Presentation of Course Projects. Final project report due.