In this tutorial, you'll learn just enough Python to create professional looking line charts. Then, in the following exercise, you'll put your new skills to work with a real-world dataset.
We begin by setting up the coding environment.
import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")
We'll work with a dataset of 150 different flowers, or 50 each from three different species of iris (Iris setosa, Iris versicolor, and Iris virginica).
Each row in the dataset corresponds to a different flower. There are four measurements: the sepal length and width, along with the petal length and width. We also keep track of the corresponding species.
# Path of the file to read
iris_filepath = "../input/iris.csv"
# Read the file into a variable iris_data
iris_data = pd.read_csv(iris_filepath, index_col="Id")
# Print the first 5 rows of the data
iris_data.head()
Sepal Length (cm) | Sepal Width (cm) | Petal Length (cm) | Petal Width (cm) | Species | |
---|---|---|---|---|---|
Id | |||||
1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
Say we would like to create a histogram to see how petal length varies in iris flowers. We can do this with the sns.histplot
command.
# Histogram
sns.histplot(iris_data['Petal Length (cm)'])
<AxesSubplot:xlabel='Petal Length (cm)', ylabel='Count'>
In the code cell above, we had to supply the command with the column we'd like to plot (in this case, we chose 'Petal Length (cm)'
).
The next type of plot is a kernel density estimate (KDE) plot. In case you're not familiar with KDE plots, you can think of it as a smoothed histogram.
To make a KDE plot, we use the sns.kdeplot
command. Setting shade=True
colors the area below the curve (and data=
chooses the column we would like to plot).