Intelligent data analysis:
Intelligent data analysis:
Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss the development of new AI-related data analysis architectures, methodologies, and techniques and their applications to various domains.
Data simply comprise a collection of numerical values recording the magnitudes of various attributes of the objects under study. Then ‘data analysis’ describes the processing of those data. Of course, one does not set out simply to analyses the data.
Orthogonal to the exploratory/confirmatory distinction, we can also distinguish between descriptive and inferential analyses. A descriptive analysis is aimed at making a statement about the data set to hand. This might consists of observations on the entirety of the population, with the aim being to answer questions about that population: What is the proportion of females? But in the inferential analysis is aimed at trying to draw conclusions which have more general validity. Inferential studies are based on samples from some population and the aim is to try to make some general statement about the broader population, most which have not been observed. Often it is not possible to observe all of the population.
The sort of tools required for exploratory and confirmatory analyses differ as they do for descriptive and inferential analyses. A tool which appears common is used in different ways. Take something as basic as the mean of a sample as an illustration. As a description of the sample, this is fixed and accurate and is the value – assuming no errors in the computation. As a value derived in an inferential process, it is an estimate of the parameter of some distribution. The fact that it is based on a sample, that it is an estimate – means that it is not really what we are interested in. In some sense we expect it to be incorrect, to be subject to change and to have a different sample. The single number which has emerged from the computational process.
The nature of data
The data is primarily concerned with numerical data, but other kinds exist. Examples include text data and image data. In text data, the basic symbols are words rather than numbers, and they can be combined in more ways than can numbers. Two if the major challenges with text data are search and matching. These have become especially important with the advent of World Wide Web. Note that the objects of textual data analysis are really the things which have given rise to the numbers. The numbers are the result of mapping by means of measuring instruments, from the world being studied being , to convenient representation. The numerical representation is convenient because we can manipulate the numbers easily and relatively effortlessly. Directly manipulating the world which is the objective of the study is less convenient. The gradual development of a quantitative view of the world which took place in the 14th and 15th centuries.
The modern data sets which are often large and many of which relate to human beings has the potential for being messy. A priori one should expect to find missing values, distortions, misrecording, inadequate sampling etc. Raw data which do not appear to show any of these problems should immediately arouse suspicion. A very real possibility is that the presented data have been cleaned up before the analyst sees them. This has all sorts of implications. Here is some illustration.
Data may be missing for a huge variety of reasons. In particular, however, data may be missing for reasons connected with the values they would have had, have they been recorded. Given that all data are contaminated, special problems can arise if one is seeking small structures in large data sets. In such cases, the distortions due to contamination may be just as large, and just as statistically significant, as the effects being sought.
Comments
Post a Comment