Project: Apply Big Data Analytics to a Real Data Set
Goals
Goal #1: motivate you to dive deep into one particular
domain area and understand the analytics requirements in that
domain
Goal #2: extract useful insights by
applying data mining and analysis techniques to a realistic
application scenario with real data
Goal #3: leverage parallel/distributed data processing
system to process the real data
Your Tasks
Here are the tasks that you will need to do.
1. Write a project proposal
- Choose a data analysis domain or application
- What kind of analytics are important in that domain?
- Find a data set. See list of public data
sets
- Write a project proposal (at least 250 words) that includes the following
information:
- Project Title
- Objectives of the project
- Why is the project interesting/important
- What data set will be used
- What analytics will be used and what insights would they potentially yield
- A timeline with milestones.
2. Work on the project
- Design the analytics that will yield the most important
insights.
- Extract, clean, transform the relevant features from the
data set
- Implement the analytics in Spark or other Analytics
System.
- Run the analysis
- Visualize and interpret the results
- Tweak and iterate
3. Present the project in class
4. Write the project report
Use ACM latex style files.
Deliverables
- Project proposals are due Wed Oct 25 2359H in Laulima->Discussions
- Presentations will be during the last two class meetings
- Project reports are due during exam week.