Tutorial: Doc2Vec and t-SNE
This post shows a tutorial of using doc2vec and the t-SNE visualization in Python for disease clustering. Of course, these tutorial codes can be used for any other types of … Read More
This post shows a tutorial of using doc2vec and the t-SNE visualization in Python for disease clustering. Of course, these tutorial codes can be used for any other types of … Read More
Two types of error are possible from a hypothesis test: Type I and Type II errors. Type I error, also known as a “false positive”, is the error of rejecting … Read More
A statistical hypothesis is an assumption about the parameters describing a population (not a sample). This assumption may be true or false. Hypothesis tests (also called significance tests) are to … Read More
First of all, let’s take a look at differences between a bar chart and a histogram. They look very similar, but they are different. Bar charts compares discrete variables while … Read More
Feature scaling is one of most important feature engineerings for many machine learning algorithms (Decision trees don’t need feature scaling necessarily). Most of the algorithms require similar scales for numerical … Read More
On this post, we will look at an example of analysis of text data that are Tolstoy’s War and Peace books. You can download a plain text file from http://www.gutenberg.org/ebooks/2600. … Read More
Recurrent Neural Networks (RNNs) are a class of neural networks for modeling sequential data such as stock prices, an audio clip, a DNA sequence, a sequence of video frames, a … Read More
Random forests are an ensemble method (generally, bagging) constructing multiple decision trees on random samples with replacement of the training set. It is a supervised learning algorithm and can be … Read More
You can find the whole process from preprocessing to visualization in another post Analysis of Text: Tolstoy’s War and Peace if you are interested in looking at the whole source … Read More