chocolateraisons

Is it dishonest to remove outliers and/ or transform data?

Posted on: December 9, 2011

First of all outliers are pieces of data, or data points that are extremely different from the rest of the other data in the sample. What qualifies as a an outlier varies from data selection to data selection, however if a data point is a several standard deviations away from the mean, or follows a completely different pattern to the other data points; then it is often considered an outlier. When outliers are identified, in many cases they are then removed from the data completely and therefore are not reported in the final write up. This could be considered dishonest because the researchers could be seen as altering the data; however in quite a few cases outliers are caused by mistakes done by the participants or researchers, and so removing the outliers are cleaning the data as the results that are being removed are not valid for the research.

Outliers, if they are left in the data can affect the statistics of the sample. They can alter the mean and variability of the sample, and therefore alter our ability to interpret statistical tests. In many cases outliers are caused by participants misunderstanding the task or by researchers making mistakes and therefore it can be argued that the outliers have no bearing on the actual conclusion because they are not valid results. For example in a study that was done by Janine Willis and Alexander Todorov (2006) was researching whether time constraints altered perceptions of people just from looking at their faces. Outliers in the study could have been caused by participants misreading what personality characteristic they were supposed to be identifying, and as such these outliers are not valid pieces of data. Since they are not valid bits of data it means that removing them is not dishonest as they hold no bearing to the research.

However in some research, especially qualitative studies, outliers can be just as informative as the other bits of data that fits to the pattern of the majority of the data. Barbara Michener and Marcia J Belcheir did a study that was looking into the first impression of freshman when they were first arriving at university. The participants took part in a number of interviews and group meetings in order for their first impressions to be accurately recorded. The majority of the participants felt that they had little to no problem adjusting and that the university provided a great deal of help and support making their transition easier. However there were a few students who took part in the study who did not find that the university was very helpful and struggled with the transition. The students who did not find the move easy, and the data that they provided, could be described as outliers, because they do not follow the trend of the majority of the data. If the researchers were to remove those pieces of data it could be considered dishonest, because although the pieces of data are outliers they are relevant and valid to the research.

When it comes to the removal of outliers from data it tends to depend on the researchers discretion as to whether removing them could be considered dishonest. The removal of outliers is subjective because it depends on the type of research and the design of the study, along with the type of data and conclusion that the experimenter is looking to form. As such, whether the removal of outliers is dishonest, in my opinion, it is also as subjective as whether outliers are outliers or not.

Leave a comment


  • None

Categories