3.  Significance in Scale Space
by F. Godtliebsen, J. S. Marron and P. Chaudhuri.


    "Scale space" is a term from Computer Vision, see Lindeberg (1994) Scale Space Theory in Computer Vision, that means a family of Gaussian kernel smooths indexed by the bandwidth.  This has a number of interesting implications for both the theory and practice of smoothing in statistics, as discussed in Chaudhuri and Marron (1997, PDF version (862 KB)Postscript Version (4.64 MB) ).  An application to finding statistically significant structure in univariate smoothing, called SiZer was developed in:  is given in Chaudhuri and Marron (1999,  2,334k postscript file | 366k GNU Zipped (.gz) | 442k Compressed (.Z) ).

Here is an example illustrating the use of SiZer, for the income data, see e.g. Marron and Schmitz(1992) Econometric Theory, 8, 476-488. for more about these data, including another way seeing that there are two modes.

 

    This newer work studies related ideas in the context of 2-d images.  There were two major hurdles.  First the concept of "slope" is more complicated in 2-d, and thus required new ways of thinking about its "statistical significance".  Second the higher dimensionality required a new visual paradigm.  This was done via movies which "morph through the family of smooths", with bandwidth (i.e. scale) represented by time.
 


3a.  Image Analysis

    Full details of the SSS in the context of image analysis are developed in Godtliebsen, Marron and Chaudhuri (1999a) [ PDF version (701 KB)Postscript Version (3.61 MB) ].  The basis is a family of Gaussian kernel smooths, indexed by the bandwidth, shown as a movie of gray level plots.  This is seen for some gamma camera data in this  movie.

    The simplest approach to understanding statistical significance of features in each smooth is based on gradients.  When the gradient is significantly different from 0 (i.e. there is some "statistically significant slope"), and arrow is drawn in the gradient direction.  Here is one frame of this movie, representing one scale, i.e. level of smoothing.

(Caution: this is only a "screen shot", so the buttons don't work.  Click here to see this movie)
The arrows show that the diagonal ridges are "really there", as are the faintly brighter spots at a few locations, in particular the barely bright spot near the center of the image.  This movie version shows how this evolves over the full range of smoothing scales.
 

    Another approach is based on significant curvature.  See the paper [ PDF version (701 KB)Postscript Version (3.61 MB) ] for details, but the main idea is that colored dots reflect different types of significant curvature, as shown here.

(Caution: this is only a "screen shot", so the buttons don't work.  Click here to see this movie)
These highlight the ridges and valleys in a different way, and again show that the bright spots are "really there".  Since different features show up at different levels of resolution, it is worth looking at the full movie
.

    The arrow and dot visualizations can be combined to give:

(Caution: this is only a "screen shot", so the buttons don't work.  Click here to see this movie)
 

    A weakness of the above visualizations is the "raster effect" caused by the symbols lying on a rectangular grid.  This is distinctly not "rotation invariant".  Current thought on the presentation of vector fields of directional data is that a better presentation device is "streamlines", which are continuous lines that follow the gradient direction.

(Caution: this is only a "screen shot", so the buttons don't work.  Click here to see this movie)
This is a different way of seeing the significance of the same features.  Again the movie version is well worthwhile.
 

    Many more examples that illustrate the usefulness of this method, and also that test it in various ways, may be found in the paper [ PDF version (701 KB)Postscript Version (3.61 MB) ].

     General purpose Matlab software, that made these movies, and also can be easily used on other data sets is available at  http://www.unc.edu/depts/statistics/postscript/papers/marron/SSS_software/.  The whole collection of files should be downloaded, e.g. to a single directory, because many of them call each other. The Matlab subroutine conv2.m, in the Signal Processing toolbox is required. The main call is to the subroutine sss1.m.  The Matlab command ">> help sss1" gives information about how to use the various versions of SSS.
 
 


3b. Bivariate Density Estimation

    The visualizations developed above have been adapted to density estimation, in Godtliebsen, Marron and Chaudhuri (1999b) [ PDF version (451 KB)Postscript Version (2.02 MB) ].

    An example illustrating this is the Melbourne Daily Maximum Temperature Data, analyzed by Hyndman, Bashtannyk and Grunwald (1996) Journal of Computational and Graphical Statistics, 5, 316-336.  Here is a "lag one scatterplot" of the data, where the x-axis represents yesterday's maximum and the y-axis represents today's maximum.

Note there is an apparent ridge along the line y=x, which is consistent with the idea of predicting today's max by using yesterday's max.  Less clear, but graphically presented by Hyndman, et. al., is a "horizontal arm", near the line y=20.

    Our SSS methodology not only shows that this arm is statistically significant, but also finds another vertical arm.  Both arms have been explained by meteorologists via a continental warm air mass changing places with sea breezes.

    This can be seen with any of our visual approaches:
             Significant Arrows
             Significant Dots
             Significant Arrows and Dots
             Significant Streamlines
Note that two arms show up at different times in the movies, i.e. at different levels of resolutions, or different amounts of smoothing.  Perhaps the streamline view is best:

(Caution: these are only "screen shots", so the buttons don't work.  Click here to see this movie)

    As for the image version of SSS, more examples that illustrate the usefulness of this method, and also that test it in various ways, may be found in the paper [ PDF version (451 KB)Postscript Version (2.02 MB) ].  For a detailed index of figures and movies in the paper, go here.

    General purpose Matlab software (actually the same subroutine), that made these movies, and also can be easily used on other data sets is available at  http://www.unc.edu/depts/statistics/postscript/papers/marron/SSS_software/.  The whole collection of files should be downloaded, e.g. to a single directory, because many of them call each other. The Matlab subroutine conv2.m, in the Signal Processing toolbox is required. The main call is to the subroutine sss1.m.  The Matlab command ">> help sss1" gives information about how to use the various versions of SSS.
 
 


Back to Movies Table of Contents

Back to Marron's Home Page