Applications in Data Science

(listed as Information Visualization: Applications in Data Science)

Schedule: Mondays 6:00 pm - 9:15 pm

Location: West Village H 110

Dates: Sept 9, 2019 - Dec 2, 2019

Instructor: Kylie Bemis | | Office Hours: Wednesdays 1:00 pm - 2:00 pm @ WVH 310G or by appointment

TA: Shivayogi Biradar | Office Hours: Thursdays 3:10 - 5:10 pm @ SL 045

Administration: Questions and homework postings are handled via Piazza | Sign up at

No Required Textbooks

Academic integrity: Be familiar with the university’s academic integrity policy on cheating and plagiarism.


Offers students a capstone opportunity to practice data science skills learned in previous courses. Students practice visualization, data wrangling, and machine learning skills by applying them to semester-long term projects on real-world data. Students may either propose their own projects or choose from a selection of industry options. Emphasis on the overall data science process, including identification of the scientific problem, selection of appropriate machine learning methods, and visualization and communication of results. There will be occasional lectures on special topics such as visualization, communication, and data science ethics.


(subject to change)

Date Topics HW
Mon Sep 9 Introductions
Mon Sep 16 Shantam Gupta: Quantiphi project options
Mon Sep 23 Practice Proposals / Proposals Project 1 groups due
Mon Sep 30 Jan Vitek: MapReduce and Hadoop / Proposals Proposal 1 due
Mon Oct 7 Steven Braun: Visualization / Proposals HW1 due
Mon Oct 14 Indigenous Peoples’ Day - no class
Mon Oct 21 Projects
Mon Oct 28 Sicheng Hao: Causal Inference / In-class work time HW 2 due, Project 2 groups due
Mon Nov 4 Projects Project 1 reports due, Proposal 2 due
Mon Nov 11 Veteran’s Day - no class HW 3 due
Mon Nov 18 Industry Panel / Projects
Mon Nov 25 Projects HW 4 due
Mon Dec 2 Shantam Gupta: Quantiphi project evaluations / Projects Project 1 reviews due, Project 2 reports due Sunday, Dec 8 @ 11:59pm

This schedule is subject to change and will be updated throughout the semester.


Please let me know if you use a different name or pronouns from what appears the class roster. You may use a preferred name on Piazza and when submitting assignments and exams, but please be consistent and inform the instructors. The Northeastern LGBTQA Center can provide resources for changing your name and gender marker in the Northeastern system.

Please reach out to me early if you have difficulty keeping up with class material or completing assignments for personal reasons. The We Care program at Northeastern University is a resource available to you in times of stress.

All students are expected to abide by the university’s academic integrity policy and respect Northeastern’s commitment to diversity and inclusion.

Northeastern University strictly prohibits discrimination or harassment on the basis of race, color, religion, religious creed, genetic information, sex, gender identity, sexual orientation, age, national origin, ancestry, veteran, or disability status. Please review Northeastern’s Title IX policy, which protects individuals from sex or gender-based discrimination, including discrimination based on gender-identity. Faculty members are required to report all allegations of sex/gender-based discrimination to the Title IV coordinator.

Please be kind and respectful to your fellow students regardless of identity or background. Students are expected to respect and use other students’ names and pronouns.


Four small homework assignments will be assigned to practice various data science including visualization, data wrangling, and machine learning. Each homework is due by email before class on the day it is listed, unless the instructions specify otherwise. Late homeworks will not be accepted. Extensions may be given individually if requested at least 48 hours in advance of the due date with a reasonable justification. Requests for re-grades must be made in writing no less than 1 week after receiving a grade. (The new grade may be lower than the original grade.)

Discussion of the homework assignments is encouraged, but all code and write-ups must be completed individually, and your submitted work should be your own. Copying other students’ code or writing will not be tolerated and will be considered cheating. Solutions with a very high degree of similarity with another past or current student’s will be considered plagiarism, and will be treated accordingly.


This class has no exams.


Students will propose and complete two projects in groups of 2 to 4. A small selection of industry-motivated options will also be available.

The projects may be completed with different groups. The second project may expand on the first project (with the permission of the instructor).

More details will be provided on Piazza.


The grade in this class is based out of 1000 points, distributed as follows:

Final grades will follow the following scale:

Half-letter-grades (‘+’ and ‘-’) will be given as well using cut-offs determined at the end of the semester.

This scale is subject to change at the discretion of the instructor.