Home Page for Statistics-OR 891,

Object Oriented Data Analysis,

Fall 2007

Lecture Notes:

1.    [95%]   Tuesday, Aug. 21:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-21-07.ppt  Organizational matters - What is OODA? - Visualization by Projection - Object Space & Feature Space - Curves as Data - Data Representation Issues - PCA visualization

2.    [98%]   Thursday, Aug. 23:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-23-07.ppt  Matlab Software - Time Series of Curves - Chemometrics Data - Mortality Data -

3.    [85%]   Tuesday, Aug. 28:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-28-07.ppt  Gene Cell Cycle Data - Microarrays and HDLSS visualization - DWD bias adjustment – Breast Cancer Data – Start NCI 60 Data

4.    [85%]   Thursday, Aug. 30:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-30-07.ppt  NCI 60 Data - DWD Robustness against unbalanced sampling - Linear algebra review  PP: Andrey Shabalin – Microarray Batch Adjustment

5.    [90%]   Tuesday, Sep. 4:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-4-07.ppt  Linear algebra review - Multivariate probability review - PCA as an optimization Problem - PCA Mathematics and Graphics  - PCA Redistribution of Energy PP: Travis Gaydos – PCA vs. Smoothness measures

6.    [90%]   Thursday, Sep. 6:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-6-07.ppt  Finish PCA Redistribution of Energy  - PCA Data Representation - Alternate PCA Computation & SVD - Primal - Dual PCA – SVD data analysis  PP: Jingdan Zhang – High dimensional texture synthesis

7.    [85%]   Tuesday, Sep. 11:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-11-07.ppt  Finish Primal-Dual PCA vs. SVD – Recentering  Network data - Connections between discrete and continuous curve data  PP: Spencer Hays – Extensions of Dynamic Factor Models via SVD and Optimal Smoothness

8.    [95%]   Thursday, Sep. 13:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-13-07.ppt  PCA for Corpora Callosa - Fourier Boundary Representation - Medial Representation - Movies for Visualization - Cornea PP: Mihee Lee – Deconvolution

9.    [90%]   Tuesday, Sep. 18:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-18-07.ppt  Cornea Data - Robust HDLSS (Spherical) PCA  PP: Daniel Gatti - Interpretation of ANOVA models for microarray data using PCA

10.    [92%]   Thursday, Sep. 20:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-20-07.ppt  Elliptical PCA - Clusters & PCA - Mass Flux Data  PP: Seungyeun Lee – PCA for Population Stratification

11.    [30%]   Tuesday, Sep. 25:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-25-07.ppt  Smoothing Basics – Bandwidth Selection – SiZer 

12.    [85%]   Thursday, Sep. 27:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-27-07.ppt  Revisit Mass Flux Data - SiZer Analysis of Cell Cycle Data - Data Representation  PP: Jui-Hua Hsieh – Visualization in Drug Discovery

13.    [70%]  Tuesday, Oct. 2:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-2-07.ppt  M-reps - Bladder Prostate Rectum - Data on manifolds - Mildly Non-Euclidean data - Principal Geodesic Analysis  PP: Seo Young Park – Introduction to LASSO

14.    [95%]   Thursday, Oct. 4:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-04-07.ppt  Classification - Fisher Linear Discrimination (Nonparametric & Parametric)  PP: Seonjoo Lee – Regularized PCA

15.    [95%]   Tuesday, Oct. 9:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-09-07.ppt  Classical Discrimination – HDLSS Discrimination  PP: Xiaoxiao Liu – Analysis of m-rep data & Jui-Hua Hsieh – Visualization in Drug Discovery

16.    [90%]   Thursday, Oct. 11:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-11-07.ppt  Maximal Data Piling – Relation to FLD – Start Embedding and Kernel Spaces 

17.    [95%]   Tuesday, Oct. 16:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-16-07.ppt  Support Vector Machines – Distance Weighted PP: Liying Zhang – The  application of SVM in QSAR modeling  &  Sungkyu Jung – Visualization of gene expression data via PCA and DWD

Skip.      Thursday, Oct. 18:  Fall Break

18.    [80%]   Tuesday, Oct. 23:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-23-07.ppt  Distance Weighted Discrimination – Revisit micro-array data – Face Data – Outcome Data – Simulation Comparison  PP: Changryong Baek – Wavelets

19.    [30%]   Thursday, Oct. 25:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-25-07.ppt  HDLSS Asymptotics & Geometric Representation  PP:  Hao Tang – Discriminative Nearest Neighbor approach for classification problems at decision boundary

20.    [10%]   Tuesday, Oct. 30:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-30-07.ppt  Revisit NCI 60 data – HDLSS Hypothesis Testing: DiProPerm Test  PP: Xingye Qiao – Unbalanced Classification

Skip.     Thursday, Nov. 1:   Class Canceled, SAMSI workshop

21.    [0%]   Tuesday, Nov. 6:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-06-07.ppt  DiProPerm: Particulate Matter Data, Breast Cancer Data – Start Clustering,   PP: Ruiwen Zhang – More on LASSO & Baowei Xu – PCA in Finance Data

22.    [0%]   Thursday, Nov. 8:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-08-07.ppt  Clustering – SigClust: hypothesis testing for clusters PP: Andrey ShabalinBiclustering  &  Xin Liu – Modeling Reaction- Time Distribution

23.    [0%]   Tuesday, Nov. 13:  http://stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-13-07.ppt  QQ-plots  PP: Feng Liu – DWD Analysis of Memory Test Data  &  Jun Ge – Reliability of principal component analysis

24.    [0%]   Thursday, Nov. 15:  SigClust hypothesis testing: Applied to NCI 60 data  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-15-07.ppt  PP: Yuying Xie:  QTL mapping  &  Dominik Reinhold – Landmark registration for handwriting data

25.    [0%]   Tuesday, Nov. 20:  http://stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-20-07.ppt  PP: Ying Yuan – Segmentation from perfusion images in acute stroke  &  Tong-Ying Wu –  Projecting High Dimension Chemistry Space in Low Dimension through Stochastic Proximity Embedding Technique  

Skip.      Thursday, Nov. 22:  Thanksgiving

26.    [0%]   Tuesday, Nov. 27:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-27-07.ppt  Independent Component Analysis PP: Wenjie Chen – SigClust Analysis of phycological data  &  Hong Ke – Zooming in on Human Growth

27.    [0%]   Thursday, Nov. 29:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-29-07SumanSen.ppt  Suman Sen: Classification for Manifold data  PP: Rima Hajjo – Hierarchical clustering of biological spectra: linking biological activity profiles to molecular structure  &  Xiaofang Cheng – Functional Models for Test Items

28.    [0%]   Tuesday, Dec. 4:  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/12-04-07.ppt  Trees as Data – Strongly Non-Euclidean Data –  Detailed look at Blood Vessel Data  PP: Burcu Aydin – Optimization for Trees as Data  http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/OptimizationOverTrees.pdf  &  Chaeryon Kang – Early Postnatal  Development of Corpus Callosum and Corticospinal White Matter Assessed with Quantitative Tractography

 

Notes:

·       % of overlap with material from 2005 shown as [100%] (percentage overlap in brackets)

·       Participant Presentations shown as:  Name – Topic.  

·       Tentative Future Topics shown in gray.

 

 


 

References:

Ahn, J. and Marron, J.S. (2005) Maximal Data Piling in Discrimination, http://midag.cs.unc.edu/pubs/papers/Biometrika_Ahn_submit.pdf (cited 10/9/07)

Ahn, J. (2007) Distance Weighted Discrimination, http://www.stat.uga.edu/~jyahn/DWD/ (cited 10/23/07)

Ahn, J., Marron, J. S., Muller, K. M. and Chi, Y. – Y. (2007) The High Dimension, Low Sample Size Geometric Representation Holds Under Mild Conditions, Biometrika (to appear) (cited 10/25/07)

Aizerman, Braverman and Rozoner (1964) Theoretical foundations of the potential function method in pattern recognition learning, Automation and Remote Control, 15, 821-837 (cited 10/11/07, 10/16/07)

Bickel, P.J. and Levina, E. (2004) Some theory for Fisher's Linear Discriminant function, "naive Bayes", and some alternatives when there are many more variables than observations, Bernoulli, 10, 989-1010 (cited 10/9/07)

Born, M. and Wolf, E. (1980) Principles of Optics: Electromagnetic Theory of Propagation, Interference and Diffraction of Light, Pergamon Press, New York (cited 9/18/07)

Boser, B. E., Guyon, I. and Vapnik, V. (1992) A Training Algorithm for Optimal Margin Classifiers, in Fifth Annual Workshop on Computational Learning Theory, ACM  (cited 10/16/07)

Burges, C. J. C. (1998) A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2, 121-167 (cited 10/16/07)

Bullitt, E. and Aylward, S. (2002). Volume rendering of segmented image objects. IEEE Transactions on Medical Imaging, 21, 998-1002  (cited 12/4/07)

caBIG (2006) Distance Weighted Discrimination (DWD), https://cabig.nci.nih.gov/tools/DWD

Cardoso, J. F. (1989). Source separation using higher order moments In Proc. ICASSP, 2109-2112 (cited 11/27/07)

Chaudhuri, P. and Marron, J. S. (1999) SiZer for exploration of structure in curves, Journal of the American Statistical Association, 94, 807-823 (cited 9/25/07)

Cristianini, N. and Shawe-Taylor, J. (2000) An Introduction to Support Vector Machines, Cambridge University Press (cited 10/17/07)

Devijver, P. A. & Kittler, J. (1982) Pattern Recognition: A Statistical Approach, Prentice Hall, London  (cited 8/21/07)

Diaconis, P. & Freedman, D. (1984) Asymptotics of Graphical Projection Pursuit, Annals of Statistics, 12, 793-815. (cited 11/27/07)

Domingos, P. & Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 29:103–­137 (cited 10/4/07)

Duda, R. O. and Hart P. E. (1973) Pattern classification and scene analysis, Wiley, New York (cited 10/4/07)

Duda, R. O., Hart P. E. and Stork, D. G. (2001) Pattern classification, Wiley, New York (cited 10/4/07)

Fan, J. & Gijbels, I. (1996) Local Polynomial Modelling and Its Applications, Chapman and Hall, London  (cited 8/25/07)

Fisher, R.A. (1936) The Use of Multiple Measurements in Taxonomic Problems. Annals of Eugenics, 7, 179-188  (cited 10/4/07)

Fisher, N. I. (1983) Graphical Methods in Nonparametric Statistics: A Review and Annotated Bibliography, International Statistical Review, 51, 25-58  (cited 11/13/07)

Fletcher, P.T. (2004) Statistical Variability in Nonlinear Spaces: Application to Shape Analysis and DT-MRI, Ph.D. Thesis, Department of Computer Science, University of North Carolina, http://www.cs.unc.edu/~fletcher/fletcher_thesis.pdf.gz  (cited 10/2/07)

Fletcher, P.T., Joshi, S., Lu, C., Pizer, S.M. (2004) Principal Geodesic Analysis for the Study of Nonlinear Statistics of Shape, IEEE Transactions on Medical Imaging, 23, 995-1005  (cited 10/2/07)

Gabriel, K. R. (1971) The biplot display of matrices with application to principal component analysis, Biometrika, 58, 467  (cited 8/23/07)

Gersho, A. & Gray, R. (1992) Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston  (cited 11/6/07)

Good, I. J. and Gaskins, R. A. (1980) Density estimation and bump-hunting by the penalized likelihood method exemplified by scattering and deteorite data, Journal of the American Statistical Association, 75, 42-73  (cited 9/25/07)

Gower, J. C. (1974) The mediancentre, Applied Statistics, 23, 466-470 (cited 9/18/07)

Hall, P., Marron, J. S. & Neeman, A. (2004) Geometric representation of high dimension, low sample size data, Journal of the Royal Statistical Society Series B, 67, 427-444 (cited 10/25/07)

Hampel, F. M., Ronchetti, E. R., Rouseeuw, P. J. and Stahel, W. A. (1986) Robust Statistics: the Approach Based on Infuence Functions, Wiley, New York (cited 9/18/07)

Hannig, J., Marron, J. S. and Riedi, R. H. (2001) “Zooming statistics: Inference across scales , Journal of the Korean Statistical Society, 30, 327-345.

 

Hartigan, J. A. (1975), Clustering Algorithms, Wiley (cited 11/06/07)

 

Hsu, C.-W. and Lin, C.-J. (2002) A comparison of methods for multiclass support vector machines, IEEE Transactions on Neural Networks, 13, 415-425 (cited 10/23/07)

Huber, P. (1981) Robust Statistics. Wiley, New York (cited 9/18/07)

Hyvärinen, A. and Oja, E. (1999) Independent Component Analysis: A Tutorial,  http://www.cis.hut.fi/projects/ica (cited 11/27/07)

Hyvärinen, A., Karhunen, J. and Oja, E. (2001) Independent Component Analysis, John Wiley & Sons, New York (cited 11/27/07)

Izenman, A. J. and Sommer, C. J. (1988) Philatelic mixtures and multimodal densities, Journal of the American Statistical Association, 83, 941-953  (cited 9/25/07)

Jones, M.C., Marron, J.S. & Sheather, S.J. (1996) A brief survey of bandwidth selection for density estimation, Journal of the American Statistical Association, 91, 401-407  (cited 9/25/07)

Kaufman, L. and Rousseeuw, P. J. (2005), Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Series in Probability and Statistics (cited 11/06/07)

Lee, T. W. (1998) Independent Component Analysis: Theory and Applications, Kluwer (cited 11/27/07)

Lee, Y., Lin, Y. and Wahba, G. (2004) Multicategory Support Vector Machines, Theory, and Application to the Classification of Microarray Data and Satellite Radiance Data, Journal of the American Statistical Association, 99, 67-81 (cited 10/23/07)

Li, G. and Chen, Z. (1985) Projection pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo, Journal of the American Statistical Association, 80, 759-776 (cited 9/18/07)

Lindeberg, T. (1994) Scale Space Theory in Computer Vision, Kluwer, Boston (cited 9/25/07)

Liu, Y., Hayes, D. N., Nobel, A. & Marron, J. S. (2007) Statistical Significance of Clustering for High Dimension Low Sample Size Data, unpublished manuscript (cited 11/13/07)

Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T. and Cohen, K. L. (1999) Robust PCA for Functional Data, Test, 8, 1-73 (cited 9/18/07)

Marron, J. S., Todd, M. J. and Ahn, J. (2007) Distance Weighted Discrimination, to appear in the Journal of the American Statistical Association  (cited 9/25/07)

McLachlan, G. J. (2004) Discriminant Analysis and Statistical Pattern Recognition, Wiley-Interscience

MacQueen, J. B. (1967): Some Methods for classification and Analysis of Multivariate Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297 (cited 11/06/07)

Milasevic, P. and Ducharme, J. R. (1987) Uniqueness of the spatial median, Annals of Statistics, 15, 1332-1333 (cited 9/18/07)

Paul, D. (2007) Asymptotics of the leading sample eigenvalues for a spiked covariance model, to appear in Statistica Sinica (cited 10/25/07)

Perou, C. M., Sorlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lonning, P. E., Borresen-Dale, A.-L., Brown, P. O., & Botstein, D. (2000) Molecular Portraits of Human Breast Tumors, Nature, 406, 747-52.

Ramsay, J. O. & Silverman, B. W. (2005) Functional Data Analysis, 2nd Edition, Springer, N.Y. ISBN 0-387-40080-X  (cited 8/21/07)

Ramsay, J. O. & Silverman, B. W. (2002) Applied Functional Data Analysis, Springer, N.Y. ISBN 0-387-95414-7  (cited 8/21/07)

Ramsay, J. O. (2005) Functional Data Analysis Web Site, http://ego.psych.mcgill.ca/misc/fda/  (cited 8/21/07)

Rondonotti, V., Marron, J. S. and Park, C. (2007) SiZer for time series: a new approach to the analysis of trends, Electronic Journal of Statistics, 1, 268-289 (http://dx.doi.org/10.1214/07-EJS006)  (cited 9/25/07)

Rousseeuw, P. J. and Leroy, A. M. (1987) Robust Regression and Outlier Detection, Wiley, New York (cited 9/18/07)

Schölkopf, B., Smola, A. and Müller, K. R. (1998) Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10, 1299-1319 (cited 10/11/07 & 10/16/07)

Schölkopf, B. and Alex Smola, A. (2002) Learning with Kernels, MIT Press (10/16/07)

Schwiegerling, J., Greivenkamp, J. E. and Miller, J. M. (1995) Representation of videokeratoscopic height data with Zernike polynomials, Journal of the Optical Society of America, Series A, 12, 2105-2113 (cited 9/18/07)

Spellman, P. T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B. (1998), “Comprehensive Identification of Cell Cycle-regulated Genes of the Yeast Saccharomyces cerevisiae by Microarray Hybridization”, Molecular Biology of the Cell, 9, 3273-3297  (cited 8/23/07)

Staudte. R. G. and Sheather, S. J. (1990) Robust Estimation and Testing, Wiley, New York (cited 9/18/07)

Toh, K. A. (2007)  SDPT3 version 4.0 (beta) -- a MATLAB software for semidefinite-quadratic-linear programming  http://www.math.nus.edu.sg/%7Emattohkc/sdpt3.html  (cited 10/23/07)

Vapnik, V, N. (1982) Estimation of dependences based on empirical data, Springer (Russian version, 1979) (cited 10/17/07)

Vapnik, V. N. (1995) The nature of statistical learning theory, Springer (cited 10/17/07)

Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S, Fourth Edition, Springer, N. Y., ISBN 0-387-95457-0  (cited 8/21/07)

Wand, M. P. & Jones, M. C. (1994) Kernel Smoothing, Chapman & Hall/CRC, ISBN: 0412552701  (cited 9/25/07)

Wang, H. and Marron, J. S. (2007) Object oriented data analysis: sets of trees, Annals of Statistics, 35, 1849-1873  (cited 12/4/07)

Wichers, L., Lee, C., Costa, D., Watkinson, W. and Marron, J. S. (2007) A Functional Data Analysis Approach for Evaluating Temporal Physiologic Responses to Particulate Matter, Technical Report UNC/STOR/07/05, http://stat-or.unc.edu/webspace/webpage/Tech_rep/FDA_ratPMdata07_05.pdf  (cited 10/30/07)

Zhao, X., Marron, J. S. & Wells, M. T. (2004) The Functional Data View of Longitudinal Data, Statistica Sinica, 14, 789-808  (cited 8/23/07)

Zhang, L., (2006), "SVD movies and plots for Singular Value Decomposition and its Visualization", talk available at http://www.unc.edu/~lszhang/research/network/SVDmovie  (cited 9/13/07)

 

 

 


 

 

Course Information:

Instructor:   J. S. Marron, Professor

 

Email:   marron@email.unc.edu   (checked regularly)

 

Office:   Smith 309

 

Phones:

      Office:    919-962-2188

      Home:   919-493-2844

      FAX:   919-962-2188

 

Formal Office Hours:

      Tuesdays 11:00 – 12:00

      Thursdays 2:00 – 3:00

 

Informal Office Hours:      When I am in my office (priority to those with appointments), and by email appointment

 

 

 

 

 

 

 

 

 

 

 

Class Meetings:

      T-Th 12:30 - 1:45,    Smith 107

 

 


 

Grading:

·       Based on “Participant Presentation”

·       Length to be decided, but probably 10 minutes

·       Perhaps last 10 to 20 minutes of each class meeting

·       Topic:

o  Could be related work you are doing (have done)

o  Or you can find a paper to present

o  Or I can suggest a paper to present

·       Would like to start next week:    Volunteers?