End-to-end Data Science Project

In this assigment you will work in teams of two students to work on an end-to-end data science project. The project should encompass the following stages (not in strict sequential order):

  1. Data acquisition - finding the data, downloading/scraping it ethically.
  2. Data cleaning and transformation
  3. Identifying the questions to investigate (it does not need to be a prediction task!). Make sure questions are well-motivated.
  4. Analyzing the data, building models
  5. Validating and evaluating the analysis and models
  6. Presenting/communicating the results (answers to the questions identified in Step 3)
  7. Iterate if there are new/more questions.

Project groups will be initially assigned by the instructor and you are allowed to switch groups if all parties in the two groups agree. Under exceptional cases, you may be allowed to work alone (i.e., single person group). The project will be graded as a group and all members in the group will have the same grade.

Additional constraints on the projects as follows:

Resources

Deliverables

  1. Project proposal (posted to slack) is due on Nov 1, 2018.
  2. A project report in the form of a Jupyter Notebook (ipynb file) describing your work due on Dec 12, 2018.
  3. Any additional notebook/scripts you have used
  4. A 10-minute video presentation of your work (posted to youtube as unlisted video and submit the link to laulima)
  5. In-class presentation of the video with live Q & A (Dec 4 & 5, 2018)

Project proposals must include

You should organize your project report notebook so that the main narrative comes first with additional appendixes giving details. Think of the main narrative as a short 4-6 page article/paper that tells the “story”. You might want to have two different notebooks: one working notebook that contains most of your analysis code to generate the visualization (output to image files) and a separate notebook that focuses on the narrative.

For those who have problems running the processing within Jupyter Notebook, you may run the heavy lifting python code outside of Jupyter Notebook, but you must write up your work in Jupyter Notebook for submission.

Do not submit the data set with your submission, but you need to submit any code/scripts needed to reproduce your analysis.

The 10-minute video presentation should tell a story based on the data analysis. No post production video editting required. A single take screen recording with a decent microphone is all that is needed. You should prepare a couple of slides (ppt, google docs, or Prezi ) and record using a screen recording tool like Quicktime or screencast-o-matic.

Submission Procedure

Project proposal should be posted to slack under #projects. Note that only one post per project required.

Submit your files via Laulima->Assignment.

If you have many files, you might want to zip them up into one archive (zip and tgz accepted. rar is NOT accepted).