Python Lab: Handling Dirty and Missing Data Washington, District of Columbia, USA May 2019

Details

In this meetup, we will learn how to master data cleansing. The first step in data science is manipulating and cleaning data. In fact, most of the data scientist’s time will be spent cleaning and “fixing” data. Participants will leave with a Jupyter notebook detailing deduplication, data replacement, missing data imputation, string matching, encoding, scaling, standardization, and more. Walk away crucial time saving skills to clean and prepare your data for analysis.

Description

Data scientists can end up doing a wide variety of things across a wide variety of industries, but almost every data science job shares at least one thing in common: data cleaning. The real world is messy, after all, and that means real-world datasets tend to be messy, too. Incomplete entries, inconsistent formatting, and entry errors – these are things you’ll encounter in almost every dataset you work with.

In this session, we will discuss some general considerations for missing data. The difference between data found in many tutorials and data in the real world is that real-world data is rarely clean and homogeneous. In particular, many interesting datasets will have some amount of data missing. To make matters even more complicated, different data sources may indicate missing data in different ways.

Participants will need the following:

EQUIPMENT:

  • a working laptop
  • your valid government issued identification (that matches your registered name)

PROGRAMS:

  • Python 3
  • Jupyter NotebookPython 3
  • Google Drive/Google Colaboratory

PYTHON PACKAGES:

  • Pandas
  • Numpy
  • Sci-kit Learn
  • Re
  • Os
  • Fuzzywuzzy
  • Datetime

Instructors for the course are data scientists Sian Lewis (Booz Allen Hamilton) and Valeria Rozenbaum (Thomson Reuter).

IMPORTANT: For building entrance, please bring your government issued identification. One must RSVP to be allowed entry. There will be no registration at the door.

Join us at the Python for Data Science Lab: Handling Dirty and Missing Data on Thursday, May 30, 2019 at 6:30 pm in Booz Allen Hamilton’s Innovation Center located at [masked] Street NW Washington, DC 20005.

This event is intended to be exclusively open for women!

Organizer:

Sian L ( event organizer)
Monica (co-organizer)

Source URL: https://www.meetup.com/Women-Who-Code-DC/events/261258995/

Our Services: