Modern data analytic tools

Modern data analytic tools

A classical approach to supervised classification methods was to combine and transform the raw measured variables to produce ‘ features’, defining a new data space in which the classes were linearly separable. This basic principle has been developed very substantially in the notion of support vector machines, which use some clever mathematics to permit the use of an effectively an infinite number of features. Early experience suggests that methods based on these ideas produce highly effective classification algorithms. Time series occupy a special place in data analysis because they are so ubiquitous. As a result of their importance, a wide variety of methods has been developed.

Statistics and machine learning, the two legs on which modern intelligent data analysis stands, have differences in emphasis. One of these differences is the importance given to the interpretability of a model. For example, in both domains, recursive partitioning of tree methods have been developed. These are essentially predictive models which seek to predict the value of the response variable from one or more explanatory variables. They do this by partitioning the space of explanatory variables into the discrete regions, such that a unique predicted value of the response variable is associated with each region. While there is overlap, the statistical development has been more concerned with more predictive accuracy, and the machine learning development with methods for rule induction.

A rule is a substructure of a model which recognizes a specific pattern in the data base and takes some action. From this perspective, such tools for data analysis are very much machine learning tools.

Resampling


Resampling techniques are computationally expensive techniques that reuse the available sample to make statistical inferences. Because of their computational requirements, these techniques were infeasible at the time that most of “classical” statistics were developed. With the availability of ever faster and cheaper computers, their popularity has grown very fast in the last decade. In this section, we provide a brief introduction to some important resampling techniques.

Comments

Popular posts from this blog

Artificial Intelligence

The taxonomy of CASE Tools

Zoho Second round - adding a digit to all the digits of a number