# Box plot

The middle line of the box is the median of the data, which represents the average level of the sample data.
The upper and lower limits of the box are the upper quartile and the lower quartile of the data. This means that the box contains 50% of the data. Therefore, the width of the box reflects the degree of fluctuation of the data to a certain extent. There is another line above and below the box. Sometimes it represents the maximum and minimum values, and sometimes there are some dots “popping out”. Please don’t entangle, don’t entangle, don’t entangle (the important thing is said three times), if something pops out, just understand it as an “outlier”.

The relationship between boxplot and normal distribution:

The first point to note is that not all data are suitable for box plots. The second point to explain, and more importantly, is how the box plot should be used. The answer is to draw grouped box plots with qualitative variables for comparison!
Summarize
• The box plot is for continuous variables, and the interpretation focuses on the average level, the degree of volatility, and outliers.
• When the box is squashed very flat, or there are many abnormalities, try to perform a logarithmic transformation.
• When there is only one continuous variable, it is not suitable to draw box plots. Histograms are the more common choice.
• The most effective way to use box plots is to compare and draw grouped box plots with one or more qualitative data.

The main reason for some ugly box plots is that the boxes are squashed so that there is only one line left, and there are many dazzling outliers. There are two common reasons for this situation. The first is that there are very large or very small outliers in the sample data. This outlier performance causes the entire box to be compressed, but these abnormalities are highlighted; the second is that the sample data is very small, and the data is very small. It is possible that various weird situations will occur, which will cause the statistic graph to look like a sorry viewer.
If the box plot you draw looks like this, there are two solutions. First, if the value of the data is positive, you can try logarithmic transformation. The logarithmic transformation must be recommended by the wall crack, which can be called a cosmetic artifact in the painting world, specializing in various asymmetric distributions, non-normal distributions, and heteroscedastic phenomena. Figure 3 is a set of box plots before and after the facelift. If you say that I don’t want to change, then you can take the second solution, that is, not to draw a box plot.

This box plot is a little complicated and involves 3 variables. The quantitative variable is the length of tooth growth, which is reflected in the ordinate of the graph, which is the content displayed in the box. The first qualitative variable is the dose of vitamin C. The three levels (0.5mg, 1mg, and 2mg) are reflected on the abscissa, so there are 3 sets of box plots; the second qualitative variable is the food consumed, which is vitamin C It is still orange juice, shown in yellow and orange respectively, so each group of box plots contains two boxes.

From Figure, these conclusions can be drawn:
•As the dosage increases, the average (median) level of tooth growth length increases regardless of what kind of food is eaten.

•When the dosage is 0.5mg and 1mg, the average length (median) of tooth growth caused by the consumption of orange juice is higher than that of the consumption of vitamin C, and the degree of fluctuation is correspondingly greater.

•When the dosage was 2 mg, the average (median) level of tooth growth brought about by eating the two foods was the same, and the fluctuation of the tooth growth length of eating vitamin C was relatively greater.

in conclusion:
• The box plot is for continuous variables, and the interpretation focuses on the average level, the degree of volatility, and outliers.
• When the box is squashed very flat, or there are many abnormalities, try to perform a logarithmic transformation.
• When there is only one continuous variable, it is not suitable to draw box plots. Histograms are the more common choice.
• The most effective way to use box plots is to compare and draw grouped box plots with one or more qualitative data.

• The box plot is for continuous variables, and the interpretation focuses on the average level, the degree of volatility, and outliers.

• When the box is squashed very flat, or there are many abnormalities, try to perform a logarithmic transformation.
When there is only one continuous variable, it is not suitable to draw box plots. Histograms are the more common choice.

• The most effective way to use box plots is to compare and draw grouped box plots with one or more qualitative data

Scroll to Top