Category Archives: Learn

Decluttering R

DSC_0937
Importance of decluttering the R environment

R is a versatile and powerful programming language that enables the user to perform various types of statistical and data analyses. Like with any other tool, R’s potential largely lies in the user’s knowledge of the extent of its capability. Having used R extensively over a period of time, we have some useful tips we think will benefit the beginner and the seasoned R user alike. Because R is open source, its adaptation has increased exponentially. Several users without any programming or computer science background have been able to benefit from it. Being a newcomer to programming and scripting languages myself, I have fallen prey to several programming and scripting fallacies. Over the course of time, thanks to a multitude of help from experienced colleagues, and to the sea of information readily available on the internet, I have been able to learn several programming etiquettes which I wish I knew sooner. Continue reading Decluttering R

Dispelling illusions using Visualizations

Visualizations are a great data exploration technique. Our human minds are better able to understand and retain visuals than scripts or text. Visualizations, apart from giving us a good general overview of the data, entail us with an intuitive understanding of the distribution of the dataset and its trends.
Continue reading Dispelling illusions using Visualizations

Tracing Search Behavior using Social Network Analysis

Introduction

The aim of this paper is to study the search behavior of users, based on their Google search query terms, and to find similarities between search behaviors of a pool of users. We want to identify the types of searches that are central to other searches. These searches would ideally lead to searches of other kinds, and it would be conducive to invest in Google ads for searches of this type. Continue reading Tracing Search Behavior using Social Network Analysis

The implication of R-Squared

The R squared value, called the coefficient of determination, determines how well the data points fit on a regression equation. More specifically, the R squared value is a measure of how the independent variables in a regression equation explain the variables of the dependent variable. The value of R squared can change based on the inclusion or removal of variables in the regression model. R squared values are typically used as a measure of the effectiveness of a model.  Hence, a high R squared value (anything above 55%), can be an indicator of a capable model. Continue reading The implication of R-Squared