In this tutorial, you'll learn just enough Python to create professional looking line charts. Then, in the following exercise, you'll put your new skills to work with a real-world dataset.

Set up the notebook

We begin by setting up the coding environment.

import pandas as pd
pd.plotting.register_matplotlib_converters()
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
print("Setup Complete")

Select a dataset

We'll work with a dataset of 150 different flowers, or 50 each from three different species of iris (Iris setosaIris versicolor, and Iris virginica).

https://storage.googleapis.com/kaggle-media/learn/images/RcxYYBA.png

Load and examine the data

Each row in the dataset corresponds to a different flower. There are four measurements: the sepal length and width, along with the petal length and width. We also keep track of the corresponding species.

# Path of the file to read
iris_filepath = "../input/iris.csv"

# Read the file into a variable iris_data
iris_data = pd.read_csv(iris_filepath, index_col="Id")

# Print the first 5 rows of the data
iris_data.head()
Sepal Length (cm) Sepal Width (cm) Petal Length (cm) Petal Width (cm) Species
Id
1 5.1 3.5 1.4 0.2 Iris-setosa
2 4.9 3.0 1.4 0.2 Iris-setosa
3 4.7 3.2 1.3 0.2 Iris-setosa
4 4.6 3.1 1.5 0.2 Iris-setosa
5 5.0 3.6 1.4 0.2 Iris-setosa

Histograms

Say we would like to create a histogram to see how petal length varies in iris flowers. We can do this with the sns.histplot command.

# Histogram 
sns.histplot(iris_data['Petal Length (cm)'])
<AxesSubplot:xlabel='Petal Length (cm)', ylabel='Count'>

https://www.kaggleusercontent.com/kf/126573707/eyJhbGciOiJkaXIiLCJlbmMiOiJBMTI4Q0JDLUhTMjU2In0..z1Qi2-1tMb7OAj2PQaZ0MQ.mdqjiJ-OWBtehfLguEiONxcf-5NWdJ3ls1KJf9n8z8UWVNFc9wfibSXVSEXNS2RZWhyUp56c1pozpC5XSGECpa0avbwUJy-AASR00TsE7MhLCalItuwnbIOkZwcb0I74hDU7qC7gwzZfxmFBmNzgflCALMaj_2kUx_5ne-IKEX96QPSqX1GUWddO7PvXqEnQPUDrrmx7NMw7m93lkCsuM3PcEsTT1a3FnM-QfpM8bgr3koqkxXI7mL0zSYmnesE_QOPmGxZrJI0ngVi_RLnkvn7oTyR_SkwRI60T9o8yCM1J9i1iHcfE6eB9t4SMy2y5LAGL6Gn9VBjrNV3GZbxqQiasmGtAo69iPtM8BzPCaOcnptP_-04lJ4JyRytSdQ5FPiDqrcxWQK-oTRkDGQnhMA1g1XTbm8UammUTk0SuFMYUsE-hx_gR-38XjD_BpLEKtboR8Jv0pAl67zx7A1x46LZVANg6wlDWpnlhxFMU9NN8go7vz1ATkIdFmRZSUNzS8mbm00GU7mcsLtdc4I9uV3F5wZfaUvcfJlhn_IW4it54wdM7h4iefVHpcdt9qQkyjfxVcR4KZm2wGoKfvUSNbzzbn3dzsjcrZbPTTCERtfSXkscFj4TTYB_MxlgV6ca8tLB1LyHbu2FiJQTbIgsd468iVU23oXiiUK-9bUyDa_s.P6xozrsKauCRMSOv4KQ5RA/__results___files/__results___5_1.png

In the code cell above, we had to supply the command with the column we'd like to plot (in this case, we chose 'Petal Length (cm)').

Density plots

The next type of plot is a kernel density estimate (KDE) plot. In case you're not familiar with KDE plots, you can think of it as a smoothed histogram.

To make a KDE plot, we use the sns.kdeplot command. Setting shade=True colors the area below the curve (and data= chooses the column we would like to plot).