Data Science
Using Data Science Tools in Python
Course Overview
More and more organizations are turning to data science to help guide business decisions. Regardless of industry, the ability to extract knowledge from data is crucial for a modern business to stay competitive. One of the tools at the forefront of data science is the Python® programming language. Python’s robust libraries have given data scientists the ability to load, analyze, shape, clean, and visualize data in easy to use, yet powerful, ways. This course will teach you the skills you need to successfully use these key libraries to extract useful insights from data, and as a result, provide great value to the business.
Course Length
Target Audience
This course is designed for students who wish to expand their ability to extract knowledge from business data. The target student for this course understands the principles and benefits of data science and has used basic data-driven tools like Microsoft® Excel® and Structured Query Language (SQL) queries, but wants to take the next steps into more advanced applications of data science.
So, the target student may be a programmer or data analyst looking to solve business problems using powerful programming libraries that go beyond the limitations of prepackaged GUI tools or database queries; libraries that give the data scientist more fine-tuned control over the analysis, manipulation, and presentation of data.
A typical student in this course should have several years of experience with computing technology, along with a proficiency in programming.
Course Prerequisites
To ensure your success in this course, you should have at least a high-level understanding of fundamental data science concepts, including but not limited to: data engineering, data analysis, data storage, data visualization, and statistics. You can obtain this level of knowledge by taking the CertNexus DSBIZ™ (Exam DSZ-110): Data Science for Business Professionals course.
You should also be proficient in programming with Python. You can obtain this level of skills and knowledge by taking the following Logical Operations courses:
- Python® Programming: Introduction
- Python® Programming: Advanced
Course-specific Technical Requirements
Hardware:
For this course, you will need one computer for each student and one for the instructor. Each computer will need the following minimum hardware configurations:
- 2 gigahertz (GHz) 64-bit (x64) processor that supports the VT-x or AMD-V virtualization instruction set and Second Level Address Translation (SLAT).
- 8 gigabytes (GB) of Random Access Memory (RAM).
- 32 GB available storage space.
- Monitor capable of a screen resolution of at least 1,024 × 768 pixels, at least a 256-color display, and a video adapter with at least 4 MB of memory.
- Bootable DVD-ROM or USB drive.
- Keyboard and mouse or a compatible pointing device.
- Fast Ethernet (100 Mb/s) adapter or faster and cabling to connect to the classroom network.
- IP addresses that do not conflict with other portions of your network.
- Internet access (contact your local network administrator).
- (Instructor computer only) A display system to project the instructor’s computer screen.
Software:
Each computer requires the following software:
- Microsoft® Windows® 10 64-bit.
- Oracle® VM VirtualBox version 6.0.10 ( VirtualBox-6.0.10-132072-Win.exe).
VirtualBox is distributed with the course data files under version 2 of the GNU General Public License (GPL). - Anaconda® for Python 3 version 2020.02.
Anaconda is distributed with the course data files under a Berkeley Software Distribution (BSD) license.
Learning Outcomes / Objectives
In this course, you will use various Python tools to load, analyze, manipulate, and visualize business data.
You will:
- Set up a Python data science environment.
- Manage and analyze data with NumPy arrays.
- Manipulate and modify data with NumPy arrays.
- Manage and analyze data with pandas DataFrames.
- Manipulate, modify, and visualize data with pandas DataFrames.
- Visualize data with Matplotlib and Seaborn.
Topic List
Lesson 1: Setting Up a Python Data Science Environment
Topic A: Select Python Data Science Tools
Topic B: Install Python Using Anaconda
Topic C: Set Up an Environment Using Jupyter Notebook
Lesson 2: Managing and Analyzing Data with NumPy
Topic A: Create NumPy Arrays
Topic B: Load and Save NumPy Data
Topic C: Analyze Data in NumPy Arrays
Lesson 3: Transforming Data with NumPy
Topic A: Manipulate Data in NumPy Arrays
Topic B: Modify Data in NumPy Arrays
Lesson 4: Managing and Analyzing Data with pandas
Topic A: Create Series and DataFrames
Topic B: Load and Save pandas Data
Topic C: Analyze Data in DataFrames
Topic D: Slice and Filter Data in DataFrames
Lesson 5: Transforming and Visualizing Data with pandas
Topic A: Manipulate Data in DataFrames
Topic B: Modify Data in DataFrames
Topic C: Plot DataFrame Data
Lesson 6: Visualizing Data with Matplotlib and Seaborn
Topic A: Create and Save Simple Line Plots
Topic B: Create Subplots
Topic C: Create Common Types of Plots
Topic D: Format Plots
Topic E: Streamline Plotting with Seaborn