What are outliers?
Let’s say our target audience is people within the age group of 20 to 30 years, both inclusive. We took input from various people along with their ages and saved the responses. Later, when we started analyzing the data, we saw that some of the respondents are below 20 years old and some of them are above 30 years old. These respondents are not our target audience. So, we need to remove their responses from our dataset. If we do not do so, then the data may lead to inefficient inferences, which we do not want. The respondents who are not our target audience, and yet whose data are present in the dataset are called outliers of the dataset.
How to detect outliers in a dataset using a box plot?
Using a box plot, one can know the spread and skewness of data. It is a standardized way of displaying the five-number summary of the data:
- The minimum
- The maximum
- The median
- The first quartile or 25th percentile and
- The third quartile or 75th percentile
A box plot usually includes two parts. It includes a box and a set of whiskers. The lower whisker denotes the minimum. The upper whisker denotes the maximum. A box is drawn from the 1st quartile or the 25th percentile to the third quartile or the 75th percentile. The horizontal line in the middle of the box denotes the median.
If there are outliers in the dataset, then the outliers are displayed as dots in a box and whisker plot. For example, let’s …






0 Comments