Bar Chart / Stacked Bar Chart

First of all, let’s take a look at differences between a bar chart and a histogram. They look very similar, but they are different. Bar charts compares discrete variables while histograms represent the frequency distribution of continuous variables. Bar charts plot categorical data while numerical data with ranges of the data grouped into bins or intervals. In this post, we will focus on the data visualization with a bar chart and a stacked bar chart. These charts show the visualization of text data “Tolstoy’s War and Peace (http://www.gutenberg.org/ebooks/2600)”. If you are interested in details of the text preprocessing, you can find them in another post “Analysis of Text: Tolstoy’s War and Peace”.

Bar Chart

The following table, which is a pandas dataframe, and bar chart show the ratio of direct sentences (i.e., quoted sentences) to indirect sentences (i.e., unquoted sentences) for each book.

ratio_direct_book
ratio_direct_book[['book', 'ratio_direct_to_indirect']].plot.bar(x='book')

This bar chart shows the ratio of direct sentences to indirect sentences for each book (categorical data). The earlier books (Book one ~ eight, except six) have higher ratio than the later books (Book nine ~ fifteen).

Stacked Bar Chart

Stacked bar charts are useful to show lots of information in one graph.

The following pandas dataframe shows that rows are books and columns are chapters. The values are word counts. We will look at a stacked bar chart visualizing the word counts for each chapter and each book.

pivot_book_ch_count.head()

The word counts of all books and chapters can be shown in a stacked bar chart. In the following chart, the X axis indicates the books, and the Y axis shows the number of words for each book. The legend indicates chapter indices. A colored bar segment in each bar indicates the number of words for each chapter.

pivot_book_ch_count.plot(kind='bar', stacked=True, figsize=(10,6), legend=False, zorder=2)
plt.grid()
plt.legend(title='chapter', loc='center left', bbox_to_anchor=(1.0, 0.5))
plt.show()

It seems that the author put more effort in the books in 1805 and 1812 than others.