Is it dishonest to remove outliers from data sets?

9 10 2011

Ok, so it’s wild card this week and I panicked and looked at the suggested topics and picked this one, so here goes…

Imagine the annoyance you must feel; you’re collecting data and it’s all fitting with your research hypothesis when suddenly BOOM! You get an extreme datum, an outlier. It’s gonna mess up your previously perfect mean, maybe even falsify your hypothesis if it’s reeally extreme. There’d be a tiny niggling part of you that would want to just drop the result, stop it spoiling your fitting data set. But would this be dishonest? Well in a way, no.

It is perfectly fine to remove an extreme data point…as long as you have a valid reason.

Outliers can be the result of good intentions; a participant may have misunderstood their instructions and so give you invalid data. However, some participants cause outliers after losing the ability to efficiently engage. Some tasks can seem tedious, or go on for too long, and participants will get bored and tired and just want the task to be over. If this happens in a computer task, this could result in them repeatedly pressing a button, not caring about the validity. Or it could be a questionnaire with Likert scales, and they just put the same number for each question.

Or maybe it’s not the participants who are the cause of the problem. It may be that the design of the experiment isn’t up to scratch, and participants can’t understand what they’re supposed to do, or it may be too difficult for them. You’ll know if this is the case if you get inconsistent data. You can tackle this by maybe asking the participant at the end of the study to explain what their method was and such. This way, if you find out they were doing it wrong, you can drop their data before making an analysis.

If you find an outlier during analysis however, this can give you an opportunity to think, ‘Why has this happened?’. Then you can look at changing your design slightly, conduct your study again and see if this solves the outlier problem.

Overall, I think it is alright to exclude extreme data, AS LONG AS you give a valid reason for doing so. If you don’t, and you get found out for discreetly dropping data that doesn’t agree with your hypothesis, you could get a dodgy reputation, and no one would trust your results and conclusions anymore. Although I think dropping anomalous data is acceptable to a certain degree, psychologists should take note of any outliers and make an effort to find out why they got this result.

 

Advertisements

Actions

Information

8 responses

14 10 2011
week 3 homework « whataloadofblog

[…] https://scarlettrose23.wordpress.com/2011/10/09/is-it-dishonest-to-remove-outliers-from-data-sets/#co… Share this:TwitterFacebookLike this:LikeBe the first to like this post. […]

14 10 2011
forcedtwoblog

I wouldn’t say that it’s not dishonest to remove outliers but i would say that you shouldn’t rely on justification alone. I can justify anything (i don’t have to believe it but i can justify it) and -in a situation where i’m observing peoples attiudes to body image and aim to find that we care alot- find evidence that body image improves when we get older (http://www.bps.org.uk/news/body-image-improves-we-grow-older) and use this information to excuse the older generation from a study on body image (they just don’t care anymore) would this be right (probably not since I want to find that we are a society of the body image-this would be cherry-picking in a way)? Can we stop any researcher doing the same? I would argue that to back up common sense there should be a sort of guidelines in place (other than peer review but they would rely on the research giving them the before and after the outliers ). We have learnt about the standard deviation of the mean and how to us that to find a sort of range for justifying outliers but there is little stopping researchers from cherry picking. This can be easily done if you rely on a researcher’s justification.

14 10 2011
14 10 2011
amw1992

Although you raise a good point about excluding outliers that are caused by faulty equipment, and human error such as not fully understanding the instructions given and boredom, which Bornstein et al found to limit conditions, you may want to consider the idea that outliers can also be cause by chance. Removing an outlier caused by chance in my eyes is dishonest this is because outliers caused by chance are not a result of error, they represent a true data point. With this in mind removing this kind of outlier may result in you missing the overall effect of your study. John (1995) said that regonising these kind of outliers is a skill the researcher must gain, and the decsion of wheter to remove them or not is left to the discression of the researcher.

13 10 2011
13 10 2011
prpsjj

(i already made this comment but i forget to log in so it as come up anonymously so here it is again)

I agree with your conclusion (especially the point about how its ok to delete them if you give a valid reason) however I also believe that there are two kinds of outlier, ones that are down to error and these should be deleted because they are not true data and do not represent the population you are trying to test accurately and then there is the outliers that are true data but that participant is just extremely different from all the others in the data set, and here i feel that you cant delete these outliers because they are true pieces of data and represent part of the population you are testing. So the question is how do you tell the difference between these two types of outlier?

13 10 2011
Anonymous

(i already made this comment but i forget to log in so it as come up anonymously so here it is again)

I agree with your conclusion (especially the point about how its ok to delete them if you give a valid reason) however I also believe that there are two kinds of outlier, ones that are down to error and these should be deleted because they are not true data and do not represent the population you are trying to test accurately and then there is the outliers that are true data but that participant is just extremely different from all the others in the data set, and here i feel that you cant delete these outliers because they are true pieces of data and represent part of the population you are testing. So the question is how do you tell the difference between these two types of outlier?

13 10 2011
Anonymous

I agree with your conclusion (especially the point about how its ok to delete them if you give a valid reason) however I also believe that there are two kinds of outlier, ones that are down to error and these should be deleted because they are not true data and do not represent the population you are trying to test accurately and then there is the outliers that are true data but that participant is just extremely different from all the others in the data set, and here i feel that you cant delete these outliers because they are true pieces of data and represent part of the population you are testing. So the question is how do you tell the difference between these two types of outlier?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: