Data Science, Machine Learning, Deep Learning, and Artificial intelligence are among the most in-demand skills at this moment and offer a lucrative career with higher salaries. Harvard University offers free data science and AI courses on the online learning platform edX. The article covers 15 such popular data science courses from Harvard University, along with their benefits and learning outcomes.

## Statistics and R

Length – 4 Weeks

Level – Intermediate

#### You will learn

- Random variables
- Distributions
- Inference: p-values and confidence intervals
- Exploratory data analysis
- Non-parametric statistics

## Data Science Linear Regression

Length – 8 Weeks

Level – Introductory

#### You will learn

- How linear regression was originally developed by Galton
- What is confounding and how to detect it
- How to examine the relationships between variables by implementing linear regression in R

## Data Science: R Basics

Length – 8 Weeks

Level – Introductory

#### You will learn

- Basic R syntax
- Foundational R programming concepts such as data types, vectors arithmetic, and indexing
- How to perform operations in R including sorting, data wrangling using dplyr, and making plots

## Data Science: Visualization (using R)

Length – 8 Weeks

Level – Introductory

#### You will learn

- Data visualization principles
- How to communicate data-driven findings
- How to use ggplot2 to create custom plots
- The weaknesses of several widely-used plots and why you should avoid them

## Causal Diagrams: Draw Your Assumptions Before Your Conclusions

Length: 9 Weeks

Level – Introductory

#### You will learn

- How to translate expert knowledge into a causal diagram
- How to draw causal diagrams under different assumptions
- Using causal diagrams to identify common biases
- Using causal diagrams to guide data analysis

## Principles, Statistical and Computational Tools for Reproducible Data Science

Length: 8 Weeks

Level: Intermediate

#### You will learn

- Understand a series of concepts, thought patterns, analysis paradigms, and computational and statistical tools, that together support data science and reproducible research.
- Fundamentals of reproducible science using case studies that illustrate various practices
- Key elements for ensuring data provenance and reproducible experimental design
- Statistical methods for reproducible data analysis
- Computational tools for reproducible data analysis and version control (Git/GitHub, Emacs/RStudio/Spyder), reproducible data (Data repositories/Dataverse), and reproducible dynamic report generation (Rmarkdown/R Notebook/Jupyter/Pandoc), and workflows.
- How to develop new methods and tools for reproducible research and reporting
- How to write your own reproducible paper

## Statistical Inference and Modeling for High-throughput Experiments

Length – 4 Weeks

Level – Intermediate

#### You will learn

- Organizing high throughput data
- Multiple comparison problem
- Family Wide Error Rates
- False Discovery Rate
- Error Rate Control procedures
- Bonferroni Correction
- q-values
- Statistical Modeling
- Hierarchical Models and the basics of Bayesian Statistics
- Exploratory Data Analysis for High throughput data

## Advanced Bioconductor

Length – 5 Weeks

Level – Advanced

#### You will learn

- Static and interactive visualization of genomic data
- Reproducible analysis methods
- Memory-sparing representations of genomic assays
- Working with multiomic cancer experiments
- Targeted interrogation of cloud-scale genomic archives

## Data Science: Capstone

Length – 2 Weeks

Level – Introductory

#### You will learn

- How to apply the knowledge base and skills learned throughout the series to a real-world problem
- How to independently work on a data analysis project

## Introduction to Bioconductor

Length – 5 Weeks

Level – Intermediate

#### You will learn

- What we measure with high-throughput technologies and why
- Introduction to high-throughput technologies
- Next Generation Sequencing
- Microarrays
- Preprocessing and Normalization
- The Bioconductor Genomic Ranges Utilities
- Genomic Annotation

## Data Science: Probability

Length – 8 Weeks

Level – Introductory

#### You will learn

- Important concepts in probability theory including random variables and independence
- How to perform a Monte Carlo simulation
- The meaning of expected values and standard errors and how to compute them in R
- The importance of the Central Limit Theorem

## Data Science: Inference and Modeling

Length: 8 Weeks

Level: Introductory

#### You will learn

- The concepts necessary to define estimates and margins of errors of populations, parameters, estimates and standard errors in order to make predictions about data
- How to use models to aggregate data from different sources
- The very basics of Bayesian statistics and predictive modeling

## Data Science: Wrangling

Length: 8 Weeks

Level: Introductory

#### You will learn

- Importing data into R from different file formats
- Web scraping
- How to tidy data using the tidy verse to better facilitate analysis
- String processing with regular expressions (regex)
- Wrangling data using dplyr
- How to work with dates and times as file formats
- Text mining

## Data Science: Productivity Tools

Length: 8 Weeks

Level: Introductory

#### You will learn

- How to use Unix/Linux to manage your file system
- How to perform version control with git
- How to start a repository on GitHub
- How to leverage the many useful features provided by RStudio

## Data Science: Machine Learning

Length: 8 Weeks

Level: Introductory

#### You will learn

- The basics of machine learning
- How to perform cross-validation to avoid overtraining
- Several popular machine learning algorithms
- How to build a recommendation system
- What is regularization and why is it useful?