Handling Categorical Values
In this post, we will look at a common way to deal with categorical values (e.g., small, medium, large). Most machine learning algorithms work with numerical values, so we need … Read More
In this post, we will look at a common way to deal with categorical values (e.g., small, medium, large). Most machine learning algorithms work with numerical values, so we need … Read More
One of data cleaning processes is about dealing with missing values. It is very common to find missing values in your datasets. To train your model better, you need to … Read More
We can determine if the performance of a model is poor by looking at prediction errors on the training set and the evaluation set. The model might be too simple … Read More
Two types of error are possible from a hypothesis test: Type I and Type II errors. Type I error, also known as a “false positive”, is the error of rejecting … Read More
A statistical hypothesis is an assumption about the parameters describing a population (not a sample). This assumption may be true or false. Hypothesis tests (also called significance tests) are to … Read More
First of all, let’s take a look at differences between a bar chart and a histogram. They look very similar, but they are different. Bar charts compares discrete variables while … Read More
Feature scaling is one of most important feature engineerings for many machine learning algorithms (Decision trees don’t need feature scaling necessarily). Most of the algorithms require similar scales for numerical … Read More
On this post, we will look at an example of analysis of text data that are Tolstoy’s War and Peace books. You can download a plain text file from http://www.gutenberg.org/ebooks/2600. … Read More
You can find the whole process from preprocessing to visualization in another post Analysis of Text: Tolstoy’s War and Peace if you are interested in looking at the whole source … Read More