Data Science Archives - Things to Know about Machine Learning

Handling Categorical Values

April 1, 2019

In this post, we will look at a common way to deal with categorical values (e.g., small, medium, large). Most machine learning algorithms work with numerical values, so we need … Read More

Data Preprocessing Data Science Machine Learning

How to Deal with Missing Values

April 1, 2019

One of data cleaning processes is about dealing with missing values. It is very common to find missing values in your datasets. To train your model better, you need to … Read More

Data Science Machine Learning Training Models

Underfitting vs. Overfitting

February 19, 2019

We can determine if the performance of a model is poor by looking at prediction errors on the training set and the evaluation set. The model might be too simple … Read More

Data Science Statistical Significance Testing

Type I and Type II Errors

February 8, 2019

Two types of error are possible from a hypothesis test: Type I and Type II errors. Type I error, also known as a “false positive”, is the error of rejecting … Read More

Data Science Statistical Significance Testing

A statistical hypothesis is an assumption about the parameters describing a population (not a sample). This assumption may be true or false. Hypothesis tests (also called significance tests) are to … Read More

Data Science Visualization

Bar Chart / Stacked Bar Chart

January 27, 2019

First of all, let’s take a look at differences between a bar chart and a histogram. They look very similar, but they are different. Bar charts compares discrete variables while … Read More

Data Preprocessing Data Science Machine Learning

Feature Scaling

January 25, 2019

Feature scaling is one of most important feature engineerings for many machine learning algorithms (Decision trees don’t need feature scaling necessarily). Most of the algorithms require similar scales for numerical … Read More

Data Analysis Examples Data Science Machine Learning NLP

Analysis of Text: Tolstoy’s War and Peace

January 25, 2019

On this post, we will look at an example of analysis of text data that are Tolstoy’s War and Peace books. You can download a plain text file from http://www.gutenberg.org/ebooks/2600. … Read More

Data Science Machine Learning NLP

A General Approach to Preprocessing Text Data

January 13, 2019

You can find the whole process from preprocessing to visualization in another post Analysis of Text: Tolstoy’s War and Peace if you are interested in looking at the whole source … Read More

Data Science Visualization

Box Plot

January 11, 2019

A box plot (also called a whisker plot) is useful to visualize the distribution of data and find outliers. This plot displays the five-number summary: minimum, first quartile, median, third … Read More

Posts

Gradient Descent

Batch vs. Online Learning

Knowledge Graph Completion

What are Embeddings?

Batch Normalization

Gradient Boosting

The Normal Equation

Handling Categorical Values

How to Deal with Missing Values

Underfitting vs. Overfitting

Category: Data Science

Handling Categorical Values

How to Deal with Missing Values

Underfitting vs. Overfitting

Type I and Type II Errors

What is a Hypothesis Test?

Bar Chart / Stacked Bar Chart

Feature Scaling

Analysis of Text: Tolstoy’s War and Peace

A General Approach to Preprocessing Text Data

Box Plot