When constructing a histogram, it is common to make all bars the same width. One could also choose to make them all have the same area. These two options have complementary strengths and weaknesses; the equal-width histogram oversmooths in regions of high density, and is poor at identifying sharp peaks; the equal-area histogram oversmooths in regions of low density, and so does not identify outliers. We describe a compromise approach which avoids both of these defects. We regard the histogram as an exploratory device, rather than as an estimate of a density.
Usage
dhist(
x,
a = 5 * iqr(x),
nbins = grDevices::nclass.Sturges(x),
rx = range(x, na.rm = TRUE),
eps = 0.15,
xlab = "x",
plot = TRUE,
lab.spikes = TRUE
)Arguments
- x
is a numeric vector (the data)
- a
is the scaling factor, default is 5 * IQR
- nbins
is the number of bins, default is assigned by the Stuges method
- rx
is the range used for the left of the left-most bin to the right of the right-most bin
- eps
used to set artificial bound on min width / max height of bins as described in Denby and Mallows (2009) on page 24
- xlab
is label for the x axis
- plot
= TRUE produces the plot, FALSE returns the heights, breaks and counts
- lab.spikes
= TRUE labels the % of data in the spikes
Value
list with two elements, heights of length n and breaks of length n+1 indicating the heights and break points of the histogram bars.