All posts by Apurv Vispute

Apurv currently works as a statistical analyst for Five Element Analytics, an analytics firm based in NY. He graduated from Hofstra University in 2015 with an MBA in Business Analytics.

Machine Vs. Operational Learning

Today’s economy, led by advances in changing economic cycles and communication, relies more on data science. Big data techniques, a significant component of data science and business intelligence, are utilized to harness vast amounts of data quickly for analysis. Increasingly, the volume and availability of granular data, coupled with highly specific and powerful analytical tools such as R and SAS drive organizations toward making more accurate predictions with the prospect of increasing sales and generating organizational efficiencies. These predictions help enable efficient supply chains, driving down costs for producers and leading to more expedient delivery of products and services for consumers.

The widely acclaimed global phenomenon known as machine learning helps push boundaries of analysis and decision making perspective. It is only in the past decade where machine learning has become part of the daily conversation. Machine learning in its essence is the ability of machines and computer based systems to learn without requiring human guidance. Deep learning is one of the features of machine learning where algorithms use artificial neural networks to form complex models. Deep learning algorithms learn beyond levels of human insight, and help organizations forecast pitfalls in their product demand curves and show prospective areas of customer demand yet to be realized by both, consumers and producers. Companies such as Netflix utilize machine learning and deep learning to gauge demand, set price levels effectively, and even create in-house productions based on latent (hidden) viewer demands. It is not a coincidence that Netflix produced shows relating to current events and culture trends experienced in society. Netflix’s viewing suggestions are generated according to the user’s preferences and are quite accurate. Many of these recommendations would never have been directly communicated by consumers to Netflix.

There needs to be a medley of human intuition and analytical insight in order to attain relevant outcomes.

Machine learning and deep learning help present the larger picture. However, the efficacy of the analysis depends on the ability of the teams involved to derive actionable insights that align with the organization’s goals. Machine learning needs to be paired with operational learning in order to form the right analysis. Operational learning represents an innate understanding of the industry on two levels. On the macro level, it refers to the knowledge of the product/service offering and that of the competitors’ offerings, and the awareness of consumer’s needs and that of the industry trends and standards. On the micro level, OL means having a familiarity with the organization’s goals, its implementation structure, its work culture, and the individuals involved.

“The intuitive recognition of the instant, thus reality… is the highest act of wisdom.” –  D. T. Suzuki

Often, textbook applications of techniques fail to achieve the desired results, giving data analytics a bad reputation.  There is no substitute for human insight and experience, which can be loosely termed as wisdom. There needs to be a medley of human intuition and analytical insight in order to attain relevant outcomes. The discussions between individuals and across teams surrounding a data science project are probably as valuable as the analysis itself. The data presents everything as it is, and provides latitude for the interpretations of everyone involved. These discussions help unearth truly latent variables, not directly included in models, because they don’t exist directly in the data.

Discovery of latent, yet important variables, provides an understanding that transcends the initial project scope and leads to insights beyond the team’s initial goals. For example, we undertook a data science initiative for understanding user search behavior for one of our clients. The analysis aimed at providing users with promotional links relevant to their search query. We analyzed user-behavior patterns using k-means clustering, a data exploration and dimension reduction technique, and further analyzed individual interactions using random forest decision trees. The project kept expanding in scope, and multiple business units (product development, marketing and finance) became increasingly involved. The project began with a single aim of categorizing search queries, but resulted in a process increasing paid clicks, enhancing user engagement and optimizing marketing spend, based on consumer’s relative value. The project fostered communication between teams, and increased their understanding of team goals.

Conducting a complete well rounded analysis is paramount for relevant and successful outcomes. However, machine learning needs to be paired with operational learning in order to generate value. A balanced implementation enables the mathematical models to be interpreted and implemented in the appropriate context and leads to actionable results, which should be the goal of any data science/analytics project. Machine learning projects with a clearly defined business objective from an operations perspective tend to have a high success rate since they are designed, from the very beginning, with the most relevant needs of the organization.

There’s No Free Lunch, Stupid

“Tea is an act complete in its simplicity.
When I drink tea, there is only me and the tea.
The rest of the world dissolves” – Thich Nhat Hanh

A picture is worth a thousand words, and numbers have the capacity to summarize a picture with just a few statistics, especially in today’s data driven world. The right perspective is necessary for the right kind of analysis. It is not just employing the right technique , but rather, it’s implementation  determines the efficacy of the analysis and the relevance of the insight. Continue reading There’s No Free Lunch, Stupid

Decluttering R

DSC_0937
Importance of decluttering the R environment

R is a versatile and powerful programming language that enables the user to perform various types of statistical and data analyses. Like with any other tool, R’s potential largely lies in the user’s knowledge of the extent of its capability. Having used R extensively over a period of time, we have some useful tips we think will benefit the beginner and the seasoned R user alike. Because R is open source, its adaptation has increased exponentially. Several users without any programming or computer science background have been able to benefit from it. Being a newcomer to programming and scripting languages myself, I have fallen prey to several programming and scripting fallacies. Over the course of time, thanks to a multitude of help from experienced colleagues, and to the sea of information readily available on the internet, I have been able to learn several programming etiquettes which I wish I knew sooner. Continue reading Decluttering R

Dispelling illusions using Visualizations

Visualizations are a great data exploration technique. Our human minds are better able to understand and retain visuals than scripts or text. Visualizations, apart from giving us a good general overview of the data, entail us with an intuitive understanding of the distribution of the dataset and its trends.
Continue reading Dispelling illusions using Visualizations

Tracing Search Behavior using Social Network Analysis

Introduction

The aim of this paper is to study the search behavior of users, based on their Google search query terms, and to find similarities between search behaviors of a pool of users. We want to identify the types of searches that are central to other searches. These searches would ideally lead to searches of other kinds, and it would be conducive to invest in Google ads for searches of this type. Continue reading Tracing Search Behavior using Social Network Analysis