Introduction to Data Management and Processing

Note: This is a combined syllabus for two course sections. For a semester schedule, please see the respective Canvas page for each section.

Schedule/Location (Sec 2): Mondays 6:00 pm - 9:15 pm @ WVG 108

Schedule/Location (Sec 4): Tuesdays/Fridays 3:25 pm - 5:05 pm @ SL 035

Dates: Sep 8 - Dec 18 2021

Instructor: Kylie Bemis (she/her) | | Virtual Office Hours: See Piazza

TA: Devarsh Harshad Bhupatkar | | Virtual Office Hours: See Piazza

TA: Ari Fleischer | | Virtual Office Hours: See Piazza

TA: Sushant Jha | | Virtual Office Hours: See Piazza

TA: Juhi Kailash Paliwal | | Virtual Office Hours: See Piazza

TA: Chetana Sharma | | Virtual Office Hours: See Piazza

TA: Dong Liang | Virtual Office Hours: See Piazza

Piazza: Questions and lecture material are handled via Piazza | Sign up at https://piazza.com/northeastern/fall2021/ds5110sec24

Canvas (Sec 2): Course schedule and assignments are available via Canvas | Log in at https://northeastern.instructure.com/courses/90582

Canvas (Sec 4): Course schedule and assignments are available via Canvas | Log in at https://northeastern.instructure.com/courses/95162

Teams: Office hours are held virtually via Microsoft Teams | Log in at https://teams.northeastern.edu

Required Textbooks: R for Data Science (R4DS) by Wickham and Grolemund

Supplementary Textbooks: Text Mining with R (TMR) by Silge and Robinson, Advanced R (AdvR) by Wickham, R Packages (Rpkg) by Wickham

Academic integrity: Be familiar with the university’s academic integrity policy on cheating and plagiarism.


Overview

Data science is the discipline concerned with transformation, processing, management, and modeling of data for the purpose of extracting knowledge from raw observations. This course discusses the practical issues and techniques for data importing, tidying, transforming, and modeling. Programming is a cross-cutting aspect of the course. Students will gain experience with data science tools through short assignments. The course work includes a term project based on real-world data.


Topics


Policies

General

Please let me know if you use a different name or pronouns from what appears the class roster. You may use a preferred name on Piazza and when submitting assignments and exams, but please be consistent and inform the instructors. The Northeastern LGBTQA Center can provide resources for changing your name and gender marker in the Northeastern system.

Please reach out to me early if you have difficulty keeping up with class material or completing assignments for personal reasons. The We Care program at Northeastern University is a resource available to you in times of stress.

All students are expected to abide by the university’s academic integrity policy and respect Northeastern’s commitment to diversity and inclusion.

Northeastern University strictly prohibits discrimination or harassment on the basis of race, color, religion, religious creed, genetic information, sex, gender identity, sexual orientation, age, national origin, ancestry, veteran, or disability status. Please review Northeastern’s Title IX policy, which protects individuals from sex or gender-based discrimination, including discrimination based on gender-identity. Faculty members are required to report all allegations of sex/gender-based discrimination to the Title IX coordinator.

Please be kind and respectful to your fellow students regardless of identity or background. Students are expected to respect and use other students’ names and pronouns.

COVID-19

Section 2: This course is taught under the hybrid NUflex model. Students may attend class in-person or remotely. Remote students will be able to participate virtually via Zoom and/or Microsoft Teams.

Section 4: This course is taught in-person only. Students are expected to attend class in-person whenever possible. Recorded lectures from another section may be made available at the discretion of the instructor.

All sections: All course content can be accessed and completed remotely. However, some content and assignments will require synchronous virtual attendance (i.e., during the regularly scheduled class time in the Boston time zone). Classes may be recorded at the discretion of the instructor.

Per current COVID-19 guidelines, you must wear a mask and practice safe social distancing in the classroom. Instructors may remove their mask while teaching when social distancing allows.

Please do not come to class in-person if you are experiencing COVID-19 symptoms.


Technology

Piazza

Course administration, including all questions, course materials, and course announcements will be handled via Piazza.

Please do not email instructors or TAs directly – use Piazza for your questions and queries instead. This allows us to track all course-related correspondence in a single location.

General questions that may be useful to other students should be posted to the whole class. If your question is specific to you, or includes a partial solution, then post it to instructors only.

Please see this Stackoverflow guide for how to ask a good question.

Canvas

Assignments, quizzes, and grading will be handled via Canvas.

All assignments and quizzes will be posted on Canvas, and must be submitted on Canvas by the posted due date. Please do not email completed assignments or quizzes to instructors or TAs, or post them on Piazza.

Zoom

For NUflex sections, classes will be broadcast synchronously via Zoom. Remote students can use Zoom to attend class virtually, without coming to campus. For in-person sections, lectures will not be recorded. Recordings from another section may be made available at the discretion of the instructor.

Microsoft Teams

Virtual office hours will be handled via Microsoft Teams. During scheduled office hours or by appointment, instructors and TAs will be available for live chat or video call on Microsoft Teams.


Homework

Six individual homework assignments are to be completed for this class. Each homework is due online via Canvas on the date scheduled on Canvas.

Some aspects of the homework may be discussed with each other, but they should be completed individually, and your submitted work should be your own. Sharing of worked solutions will not be tolerated and will be considered cheating. Plagiarised solutions will receive a zero. Solutions with a very high degree of similarity with another past or current student’s will be considered plagiarism, and will be treated accordingly.


Quizzes

There will two cumulative quizzes during the semester. Both exams will be completed online via Canvas on the dates scheduled on Canvas.


Project

There will be a final project completed in small teams. Project teams will propose a data science project using real-world data, culminating in a presentation and written report at the end of the semester.

Project guidelines will be posted on Piazza and discussed in class.


Peer review

Students will expected to provide oral and written feedback on their classmates’ projects, both on the scientific content and on the effectiveness of their communication.

Rubrics will be posted on Piazza and discussed in class.


Late work and grading

Late assignments will not be accepted. Extensions may be given on a case-by-case basis if requested at least 48 hours in advance of the due date with a reasonable justification.

Petitions for re-grades must be made in writing via Piazza private message no later than 1 week after receiving the original grade. The petition must clearly explain why a re-grading is justified. The new grade may be lower than the original grade.

Before petitioning the instructor for a re-grade, students should first contact the grader to make sure they understand why they lost points.


Grade scale

The grade in this class is distributed as follows:

Final grades will follow the following scale:

These scales are subject to change at the discretion of the instructor.