Data Collection

 


Copied from

http://www.deakin.edu.au/~agoodman/sci101/index.php

In this section you will look at the ways in which science makes use of data, and how various forms of data are also used outside science.

On completion of this section you should be able to:

  • explain the importance of data collection, and give examples of who collects data
  • give two reasons why data may be collected without an immediate and specific purpose
  • list reasons why data (particularly in science) may be collected for a specific purpose
  • list the major ethical issues associated with undertaking scientific research.

Contents

Introduction
Data Collection as an end in itself
Establishing the parameters of a system
Establishing benchmark data
Data Collection as part of broader strategy
Propaganda
Belief justification
Market research
Decision support
'Objective' research
The conduct of research

Introduction

How often have you heard someone say 'the fact is…' or 'the facts speak for themselves'? We have an almost religious belief in the importance of facts as immutable, independent, objective pieces of information that tell us something 'real' about the world around us.

Our fascination with 'facts' is persistent and universal. They seem to offer continual reassurance: whatever the foibles of human opinion, some things at least are beyond argument. We all know that up is up, left is left, and the sun rises in the east.

The people who are most consumed by the search for facts - at least in the popular view - are scientists. To most people, science is about the search for 'truth' - which is largely equated with the accumulation of data. This magical material can be organised into useful pieces (facts) from which laws can be constructed. As the aim of science (so the argument goes) is to find the 'laws of nature', everything the scientist measures is data, and every piece of data is potentially important. In its extreme form, this approach sees science as the process of collecting (and sifting, organising and summarising) masses of data. In this scenario data have particular and special significance.

The problem with this view is that it is almost totally wrong. It is true that some scientists collect data (usually resulting from experiments) but many never handle data in the conventional sense. Few, if any, scientists see the accumulation of data as worthwhile in its own right. Most scientists know that data are only useful in the right context; data out of context are at best unhelpful, at worst misleading. Good scientists in particular have an instinct for knowing which data are useful and relevant, and which are not.

Nor does science progress (if indeed science can be said to progress) by the mere accumulation of facts. The popular image of the scientist has not caught up with modern thinking about how science is conducted. Society's conception of scientific method is quite different from that accepted as appropriate (and preferable) by those who study the process of science. In their analysis there is a right way to do science and a wrong way - and the science of popular conception is the wrong way. When you understand the distinction between these approaches, you will see more clearly what role data play in these methods.

Of course, scientists in the traditional fields (physics, chemistry, biology, astronomy and so on) are not the only people who collect data. The modern world - complex, industrialised, bureaucratic - thrives on data of all kinds. We are numbered, analysed and surveyed throughout our lives, and the results are stored and analysed. We have in some senses become a part of the statistics that largely define modern society.

But who actually collects data? All governments do, for reasons both laudable and questionable. Without up-to-date and comprehensive data about the characteristics of the population no government can plan and build the facilities and resources we have come to expect. Commercial organisations collect data to improve their economic prospects by offering the goods or services that potential customers seem to want. Researchers collect data to further their understanding of the workings of our social and economic systems. Physical scientists collect data to further their understanding of how the world functions.

The process of collecting data takes two forms: gathering data that already been collected by someone else (probably for a different purpose), and creating 'new' data. The latter is a matter of some philosophic importance, and we will also return to it shortly.

Data Collection as an end in itself

What motivates people to go through the often complex and costly process of collecting data? Apart from simply collecting information to satisfy a fascination with so-called trivia, two main reasons for collecting data without an immediate and specific purpose are:

  • to establish the parameters of a system
  • to establish benchmark data

Establishing the parameters of a system

When we investigate natural and social systems we often start with no clear idea of how the system functions. In particular, we may have no strong impression of how the properties of the components of the system may vary. If we are studying river flows, for example, we may have no idea of the likely range of values to be expected in a particular system. We can get some idea from studies of rivers in similar environments, but this may not really be transferable due to some peculiarity in this location. In this case we need to carry out preliminary studies whose results will define the parameters of the system. These parameters will be concerned with the probable extremes of the data we expect to find in the 'real' study, and the likely variability of the data. This knowledge may have a direct impact on the way in which we collect data during the major part of the study.

Establishing benchmark data

If we pursue the example of river flow studies, we can also illustrate the way in which data are sometimes collected to establish benchmarks. Regular monitoring of river levels, even if this is not part of a specific study, will help to build a picture of the general behaviour of the system. This will provide valuable comparisons and context when we study the system in more detail. Establishing benchmark data on flow patterns will indicate how 'typical' the data collected are for a particular time period; they will also reveal long-term changes in the system.

Data Collection as part of broader strategy

Most data are collected for a specific purpose, as part of a broader strategy. We may be surveying how people would react to a political proposition, or the likely sales for a new product. We may be investigating the effect of airborne pollution on vegetation systems, or measuring the mass of a newly-discovered atomic particle. The following are some of the main reasons why people collect data:

Propaganda

Some data are collected for what we might call propaganda purposes: to convince other people of the rightness of your view, or a group to which you belong. Most propaganda that involves real data is based on processing and presenting raw data in a way that suits a particular message, rather than on the generation of new data.
This category could also include instances of scientific fraud in which data are falsified or misrepresented to convince a scientist's peer group of the correctness of his or her work.

Belief justification

Many people seek data that support the views that they already hold; this is true of some scientists, although it is rare.

Market research

Enormous amounts of data are collected by commercial organisations about the buying intentions of consumers; these surveys are also widely used in the social and political arenas.

Decision support

In industry and government we have become used to the expectation that decisions will be based on careful analysis of data. For example, before building a new port, an environmental impact statement would be carried out. Relevant data about the area to be affected are collected and collated; this would then form one of the key foundations of the decision about whether or not to proceed with that development.

"Objective" research

Scientists (and those who aspire to that status) collect data as a critical part of the process of research. You'll see how this process operates in detail - and its implications for the data collection process - in later sections. You'll also examine the popular (but misleading) concept of objectivity.

 Exercise 1.1

Look through a number of issues of different magazines and newspapers for examples of data being used for specific purposes. Try to classify them using the categories defined in this section.

Ethical issues

Unlike most experimenters in physics and chemistry, or many in the earth sciences or biology, who deal with inanimate objects or materials, researchers whose subjects are people or animals must consider the conduct of their research, and give attention to the ethical issues associated with carrying out their research.

Sometimes a research project may involve changing the subjects' behaviour or, in some cases, causing them pain or distress. Most research organisations have complex rules on human and animal experimentation. Although a detailed study of such rules is beyond the scope of this unit, you should be aware that such systems exist, and what they deal with.

The American Psychological Society has developed a set of guidelines governing the conduct of research in psychology. Although some are clearly most relevant to psychology, most are applicable to all forms of research, and will give you an impression of the ethical issues involved. They can be summarised as follows:

You must justify the research via an analysis of the balance of costs.
The scientist's interest alone isn't sufficient justification to carry out research. In order to carry out experiments there have to be benefits that outweigh the costs. Researchers are expected to carry out an analysis and ensure that the research is justified.

You are responsible for your own work, and for your contribution to the whole project.
Scientists must accept individual responsibility for the conduct of the research and, as far as foreseeable, the consequences of that research.

You must obtain informed consent from any subjects.
The concept of informed consent is a major problem when dealing with research into human behaviour or physiology, particularly in research that may have harmful side-effects. Can someone give informed consent, for example, if they are below 18 years of age (or whatever is the legal age of consent)? What about potential subjects who are mentally or physically disabled?

You must ensure that all subjects participate voluntarily.
In psychological research all subjects must participate voluntarily; informed consent must be accompanied by a free decision to participate. On the other hand, it is very difficult to explain complicated research to non-specialists. Nevertheless, the onus is on the researcher to explain the research, not on the volunteer to find out about it.

You must be open and honest in dealing with other researchers and research subjects.
The researcher must be as open and honest as is reasonable. This process, whilst fine in principle, is complicated by considerations such as commercial advantage and professional rivalry.

You must not exploit subjects by changing agreements made with them.
As a researcher you might discover that your experiment shows something that you would like to further investigate, but don't want to tell your subjects about. If you did investigate further, but pretended that you were still doing the experiment that had been agreed to in the first place, this would be a form of exploitation, and would breach the principles of informed consent and voluntary participation.

You must take all reasonable measures to protect subjects physically and psychologically.
The unexpected outcomes of a series of now-famous experiments in the 1950s convinced most psychologists that even voluntary participants can 'get carried away' to the point where they have to be protected from themselves and each other. In these experiments, university psychology students were allowed, using complex behavioural rules, to 'punish' their co-subjects where they breached the rules. What surprised and disturbed the researchers was the number of students who greatly exceeded their 'right' to inflict punishment, and how widespread the process became among the subjects. Eventually the experiments had to be terminated to prevent injury to the subjects.
The researcher must be prepared to intervene, even at the cost of the experiment itself, to protect the subjects.

You must fully explain the research in advance, and 'de-brief' subjects afterwards.
Whilst full explanations before the experiment are essential to gaining informed consent, it is, unfortunately, a common practice for researchers to complete their research without telling the participants anything about the results.

You must give particular weight to possible long-term effects of the research.
Obviously this can be difficult to achieve. It means that, regardless of their strong motives for collecting information, researchers have to give particular emphasis to any potential long-term dangers of the research.

You must maintain confidentiality at all times.
Only certain people conducting the experiment should know the identify of the participants, and any subject should generally not know the identity of other subjects. The key to maintaining confidentiality is that the individual should not be identifiable.


 

Useful Link