Chapter 2 Getting Started
There are a few tools you’ll need to install to take full advantage of this book. I’m assuming that most readers will have already had at least a brief introduction to R and/or Python. If you are completely brand new to the topic, I suggest starting with books or online courses designed to teach the fundamental skills
2.1 Interactive and Replicable
This book is intended to provide replicable code for reproducing most examples in R and Python. In R, the primary graphing libraries are ggplot and plotly. In Python, the primary graphing libraries are seaborn and matplotlib.
Throughout this book, you’ll find mini-apps that let you look under the hood to see the exact code used to create an example, like this:
2.2 Developer Environments and Programming Languages
Many data scientists and analysts write their code in an integrated development environment (IDE). An IDE is a software program that brings a suite of useful tools and capabilities to help write code more quickly and easily.
RStudio is an IDE designed for data science applications. It is optimized for R, but also has features for working with other languages such as Python, SQL, Bash, and more.
You can download a free version of RStudio, here: https://www.rstudio.com/products/rstudio/download/.
Another common IDE for Python is Jupyter labs, which is a browser-based platform commonly used for R and Python. As a browser-based application, it can be much faster to get started because you don’t (usually) have to install anything on your computer. You can learn more and start using this platform at https://jupyter.org/.
This book primarily uses the R and Python programming languages. They are both open-source, meaning that they are fully and freely available to anyone and that thousands of people have contributed to their development over time. This has some advantages, such as the rapid pace of feature development and nearly constant updates to work with new technology and research techniques. Nearly every situation or problem you encounter has been solved by somebody who has shared a solution. There are also some disadvantages to such rapid development, such as the possibility that this book may become obsolete much faster than a closed-source application would!
You can download R from https://www.r-project.org/, and you can download Python from https://www.python.org/downloads/.
This book makes heavy use of these packages/libraries for each language:
R
tidyverse
(which is a suite of other packages, such asggplot2
andtidyr
)plotly
Python
pandas
matplotlib
seaborn
2.3 Book Organization
This book is designed to primarily offer practical examples and replicable code to create data visualizations. You’re welcome to jump straight to a chapter or section without reading anything else in the book.