Home Page for Statistics-OR 891,
Object Oriented Data Analysis,
Fall 2007
Lecture Notes:
1. [95%] Tuesday, Aug. 21: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-21-07.ppt Organizational matters - What is OODA? - Visualization by
Projection - Object Space & Feature Space - Curves as Data - Data
Representation Issues - PCA visualization
2. [98%] Thursday, Aug. 23: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-23-07.ppt Matlab Software - Time Series of Curves - Chemometrics Data - Mortality Data -
3. [85%] Tuesday, Aug. 28: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-28-07.ppt Gene Cell Cycle Data - Microarrays and
HDLSS visualization - DWD bias adjustment – Breast Cancer Data – Start NCI 60
Data
4. [85%] Thursday, Aug. 30: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/8-30-07.ppt NCI 60 Data - DWD Robustness against unbalanced
sampling - Linear algebra review PP: Andrey
Shabalin – Microarray Batch Adjustment
5. [90%] Tuesday, Sep.
4: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-4-07.ppt Linear algebra review - Multivariate probability
review - PCA as an optimization Problem - PCA Mathematics and Graphics - PCA Redistribution of Energy PP: Travis Gaydos – PCA vs. Smoothness measures
6. [90%] Thursday, Sep. 6: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-6-07.ppt Finish PCA Redistribution of Energy - PCA Data Representation - Alternate PCA
Computation & SVD - Primal - Dual PCA – SVD data analysis PP: Jingdan Zhang – High dimensional texture synthesis
7. [85%] Tuesday, Sep. 11: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-11-07.ppt Finish Primal-Dual PCA vs. SVD – Recentering – Network data - Connections between
discrete and continuous curve data PP:
Spencer Hays – Extensions of Dynamic Factor Models via SVD and Optimal
Smoothness
8. [95%] Thursday, Sep. 13: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-13-07.ppt PCA for Corpora Callosa -
Fourier Boundary Representation - Medial Representation - Movies for
Visualization - Cornea PP: Mihee
Lee – Deconvolution
9. [90%] Tuesday, Sep. 18: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-18-07.ppt Cornea Data - Robust HDLSS (Spherical) PCA PP: Daniel Gatti - Interpretation
of ANOVA models for microarray data using PCA
10. [92%] Thursday, Sep. 20: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-20-07.ppt Elliptical PCA - Clusters & PCA - Mass Flux Data
PP: Seungyeun Lee – PCA for
Population Stratification
11. [30%] Tuesday, Sep. 25: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-25-07.ppt Smoothing Basics – Bandwidth Selection – SiZer
12. [85%] Thursday, Sep. 27: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/9-27-07.ppt Revisit Mass Flux Data - SiZer Analysis of Cell Cycle Data - Data Representation PP: Jui-Hua Hsieh – Visualization
in Drug Discovery
13. [70%] Tuesday, Oct. 2: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-2-07.ppt M-reps - Bladder Prostate Rectum - Data on manifolds -
Mildly Non-Euclidean data - Principal Geodesic Analysis PP: Seo
Young Park
– Introduction to LASSO
14. [95%] Thursday, Oct. 4: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-04-07.ppt Classification - Fisher Linear Discrimination
(Nonparametric & Parametric) PP: Seonjoo Lee – Regularized PCA
15. [95%] Tuesday, Oct. 9: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-09-07.ppt Classical
Discrimination – HDLSS Discrimination PP: Xiaoxiao Liu – Analysis of m-rep data & Jui-Hua Hsieh – Visualization in Drug Discovery
16. [90%] Thursday, Oct. 11: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-11-07.ppt Maximal Data Piling – Relation to FLD – Start
Embedding and Kernel Spaces
17. [95%] Tuesday, Oct. 16: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-16-07.ppt Support Vector Machines – Distance Weighted PP: Liying Zhang – The application of
SVM in QSAR modeling & Sungkyu Jung –
Visualization of gene expression data via PCA and DWD
Skip. Thursday, Oct. 18: Fall Break
18. [80%] Tuesday, Oct. 23: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-23-07.ppt Distance Weighted Discrimination – Revisit micro-array
data – Face Data – Outcome Data – Simulation Comparison PP: Changryong Baek – Wavelets
19. [30%] Thursday, Oct. 25: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-25-07.ppt HDLSS Asymptotics &
Geometric Representation PP: Hao Tang – Discriminative Nearest Neighbor approach for
classification problems at decision boundary
20. [10%] Tuesday, Oct. 30: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/10-30-07.ppt Revisit NCI 60 data – HDLSS Hypothesis Testing: DiProPerm Test PP: Xingye Qiao
– Unbalanced Classification
Skip. Thursday, Nov. 1: Class Canceled, SAMSI workshop
21. [0%] Tuesday, Nov. 6: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-06-07.ppt DiProPerm: Particulate Matter Data, Breast Cancer Data – Start
Clustering, PP: Ruiwen Zhang – More on
LASSO & Baowei Xu – PCA
in Finance Data
22. [0%] Thursday, Nov. 8: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-08-07.ppt Clustering – SigClust: hypothesis testing for clusters PP: Andrey Shabalin – Biclustering & Xin Liu – Modeling
Reaction- Time Distribution
23. [0%] Tuesday, Nov. 13: http://stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-13-07.ppt QQ-plots
PP: Feng Liu –
DWD Analysis of Memory Test Data
& Jun Ge
– Reliability of principal component analysis
24. [0%] Thursday, Nov. 15: SigClust hypothesis
testing: Applied to NCI 60 data http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-15-07.ppt PP: Yuying Xie: QTL mapping
& Dominik
Reinhold – Landmark registration for handwriting data
25. [0%] Tuesday, Nov. 20: http://stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-20-07.ppt PP: Ying Yuan – Segmentation from perfusion images in acute stroke & Tong-Ying Wu – Projecting High Dimension Chemistry Space in
Low Dimension through Stochastic Proximity Embedding Technique
Skip. Thursday, Nov. 22: Thanksgiving
26. [0%] Tuesday, Nov. 27: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-27-07.ppt Independent Component Analysis
PP: Wenjie Chen – SigClust Analysis
of phycological data &
Hong Ke – Zooming in on Human Growth
27. [0%] Thursday, Nov. 29: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/11-29-07SumanSen.ppt Suman Sen: Classification for Manifold data PP: Rima Hajjo – Hierarchical
clustering of biological spectra: linking biological activity profiles to
molecular structure & Xiaofang Cheng –
Functional Models for Test Items
28. [0%] Tuesday, Dec. 4: http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/12-04-07.ppt Trees as Data – Strongly Non-Euclidean Data – Detailed look at
Blood Vessel Data PP: Burcu
Aydin – Optimization for Trees as Data http://www.stat-or.unc.edu/webspace/courses/marron/UNCstor891OODA-2007/OptimizationOverTrees.pdf
& Chaeryon
Kang – Early Postnatal Development of
Corpus Callosum and Corticospinal
White Matter Assessed with Quantitative Tractography
Notes:
·
% of overlap with material from 2005 shown as [100%]
(percentage overlap in brackets)
·
Participant Presentations shown as:
Name – Topic.
· Tentative Future Topics shown in gray.
References:
Ahn, J., Marron, J. S.,
Muller, K. M. and Chi, Y. – Y. (2007) The High Dimension, Low Sample Size
Geometric Representation Holds Under Mild Conditions, Biometrika (to
appear)
(cited 10/25/07)
Aizerman, Braverman and Rozoner
(1964) Theoretical foundations of the potential function method in pattern
recognition learning, Automation and Remote Control, 15, 821-837 (cited
10/11/07, 10/16/07)
Bickel, P.J. and
Levina, E. (2004) Some theory for Fisher's Linear Discriminant function, "naive Bayes",
and some alternatives when there are many more variables than observations, Bernoulli,
10, 989-1010 (cited 10/9/07)
Born, M. and
Wolf, E. (1980) Principles of Optics: Electromagnetic Theory of Propagation,
Interference and Diffraction of Light, Pergamon Press, New York (cited 9/18/07)
Boser, B. E., Guyon, I.
and Vapnik, V. (1992) A Training Algorithm for
Optimal Margin Classifiers, in Fifth
Annual Workshop on Computational Learning Theory, ACM (cited 10/16/07)
Burges, C. J. C.
(1998) A Tutorial on
Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 2, 121-167 (cited 10/16/07)
Bullitt, E. and Aylward, S. (2002). Volume rendering of segmented image
objects. IEEE Transactions on Medical
Imaging, 21, 998-1002
(cited 12/4/07)
Cardoso, J. F. (1989).
Source separation using higher order moments In Proc. ICASSP, 2109-2112
(cited 11/27/07)
Chaudhuri, P. and Marron, J. S. (1999) SiZer for exploration of structure in curves, Journal of
the American Statistical Association, 94, 807-823 (cited 9/25/07)
Cristianini, N. and Shawe-Taylor, J. (2000) An
Introduction to Support Vector Machines, Cambridge University Press (cited
10/17/07)
Devijver, P. A. & Kittler, J. (1982) Pattern Recognition: A Statistical
Approach, Prentice Hall, London (cited 8/21/07)
Diaconis, P. & Freedman, D. (1984) Asymptotics of
Graphical Projection Pursuit, Annals of Statistics, 12, 793-815. (cited 11/27/07)
Domingos, P. & Pazzani, M. (1997) On the optimality of the simple Bayesian classifier under
zero-one loss. Machine Learning, 29:103–137 (cited 10/4/07)
Duda, R. O. and Hart P. E. (1973) Pattern classification and scene
analysis, Wiley, New York (cited 10/4/07)
Duda, R. O., Hart P. E. and Stork, D. G. (2001) Pattern classification,
Wiley, New York (cited 10/4/07)
Fan, J. & Gijbels, I. (1996) Local
Polynomial Modelling and Its Applications,
Chapman and Hall, London (cited 8/25/07)
Fisher, R.A.
(1936) The Use of Multiple Measurements in Taxonomic
Problems. Annals of Eugenics, 7, 179-188 (cited 10/4/07)
Fisher, N. I.
(1983) Graphical Methods in Nonparametric Statistics: A Review and Annotated
Bibliography, International Statistical Review, 51, 25-58 (cited 11/13/07)
Fletcher, P.T.
(2004) Statistical Variability in
Nonlinear Spaces: Application to Shape Analysis and DT-MRI, Ph.D. Thesis,
Department of Computer Science, University of North Carolina, http://www.cs.unc.edu/~fletcher/fletcher_thesis.pdf.gz (cited 10/2/07)
Fletcher, P.T.,
Joshi, S., Lu, C., Pizer, S.M. (2004) Principal
Geodesic Analysis for the Study of Nonlinear Statistics of Shape, IEEE Transactions on Medical Imaging, 23,
995-1005 (cited 10/2/07)
Gabriel, K. R.
(1971) The biplot display of matrices with
application to principal component analysis, Biometrika,
58, 467 (cited
8/23/07)
Gersho, A. & Gray, R. (1992) Vector
Quantization and Signal Compression, Kluwer
Academic Publishers, Boston
(cited 11/6/07)
Good, I. J. and
Gaskins, R. A. (1980) Density
estimation and bump-hunting by the penalized likelihood method exemplified by
scattering and deteorite data, Journal of the American
Statistical Association, 75, 42-73 (cited 9/25/07)
Gower, J. C.
(1974) The mediancentre, Applied
Statistics, 23, 466-470 (cited 9/18/07)
Hall, P., Marron, J. S. & Neeman, A.
(2004) Geometric representation of high dimension, low sample size data, Journal of the Royal Statistical Society
Series B, 67, 427-444 (cited 10/25/07)
Hampel, F. M., Ronchetti, E. R., Rouseeuw,
P. J. and Stahel, W. A. (1986) Robust Statistics:
the Approach Based on Infuence Functions, Wiley,
New York (cited 9/18/07)
Hannig, J., Marron, J. S. and Riedi, R. H. (2001) “Zooming statistics: Inference across scales , Journal of the Korean Statistical Society,
30, 327-345.
Hartigan, J. A. (1975), Clustering Algorithms, Wiley
(cited 11/06/07)
Hsu,
C.-W. and
Lin, C.-J. (2002) A comparison of methods for multiclass support vector
machines, IEEE Transactions on Neural Networks, 13, 415-425 (cited
10/23/07)
Huber, P. (1981)
Robust Statistics. Wiley, New York (cited 9/18/07)
Hyvärinen, A. and Oja, E. (1999) Independent
Component Analysis: A Tutorial, http://www.cis.hut.fi/projects/ica
(cited 11/27/07)
Hyvärinen, A., Karhunen, J. and Oja,
E. (2001) Independent Component Analysis, John Wiley & Sons, New
York (cited 11/27/07)
Izenman, A. J. and Sommer, C. J. (1988) Philatelic
mixtures and multimodal densities, Journal of the American Statistical
Association, 83, 941-953 (cited 9/25/07)
Jones, M.C., Marron, J.S. & Sheather, S.J.
(1996) A brief survey of bandwidth selection for density estimation, Journal of the American Statistical
Association, 91, 401-407
(cited 9/25/07)
Kaufman, L. and Rousseeuw, P. J. (2005), Finding Groups in Data: An Introduction to Cluster
Analysis, Wiley Series in Probability and Statistics (cited 11/06/07)
Lee, T. W.
(1998) Independent Component Analysis: Theory and Applications, Kluwer (cited 11/27/07)
Lee, Y., Lin, Y.
and Wahba, G. (2004) Multicategory
Support Vector Machines, Theory, and Application to the Classification of
Microarray Data and Satellite Radiance Data, Journal of the American
Statistical Association, 99, 67-81 (cited 10/23/07)
Li, G. and Chen,
Z. (1985) Projection pursuit approach to robust dispersion matrices and
principal components: primary theory and Monte
Carlo, Journal of the American Statistical
Association, 80, 759-776 (cited 9/18/07)
Lindeberg, T. (1994) Scale Space Theory in Computer Vision, Kluwer, Boston (cited 9/25/07)
Liu, Y., Hayes,
D. N., Nobel, A. & Marron, J. S. (2007) Statistical Significance of Clustering for
High Dimension Low Sample Size Data, unpublished manuscript (cited
11/13/07)
Locantore, N., Marron, J. S.,
Simpson, D. G., Tripoli, N., Zhang,
J. T. and Cohen, K. L. (1999) Robust PCA for Functional Data, Test, 8,
1-73 (cited 9/18/07)
Marron, J. S., Todd, M. J. and Ahn,
J. (2007) Distance Weighted
Discrimination, to appear in the Journal
of the American Statistical Association (cited
9/25/07)
McLachlan, G. J.
(2004) Discriminant Analysis and Statistical Pattern
Recognition, Wiley-Interscience
MacQueen, J. B. (1967): Some Methods for classification and Analysis of
Multivariate Observations, Proceedings of
5-th Berkeley Symposium on Mathematical Statistics and Probability,
Berkeley, University of California Press, 1:281-297 (cited 11/06/07)
Milasevic, P. and Ducharme, J. R. (1987) Uniqueness of
the spatial median, Annals of Statistics, 15, 1332-1333 (cited 9/18/07)
Paul, D. (2007) Asymptotics of the leading sample eigenvalues
for a spiked covariance model, to appear in Statistica
Sinica (cited 10/25/07)
Perou, C. M., Sorlie, T., Eisen,
M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D.
T., Johnsen, H., Akslen, L.
A., Fluge, O., Pergamenschikov,
A., Williams, C., Zhu, S. X., Lonning, P. E., Borresen-Dale, A.-L., Brown, P. O., & Botstein, D.
(2000) Molecular Portraits of Human Breast Tumors, Nature, 406, 747-52.
Ramsay, J. O.
& Silverman, B. W. (2005) Functional Data Analysis, 2nd Edition,
Springer, N.Y. ISBN 0-387-40080-X (cited 8/21/07)
Ramsay, J. O.
& Silverman, B. W. (2002) Applied Functional Data Analysis,
Springer, N.Y. ISBN 0-387-95414-7 (cited 8/21/07)
Rondonotti, V., Marron, J. S. and Park, C. (2007) SiZer for time series: a new approach to the analysis of
trends, Electronic Journal of Statistics,
1, 268-289 (http://dx.doi.org/10.1214/07-EJS006) (cited 9/25/07)
Rousseeuw, P. J. and Leroy, A. M. (1987) Robust Regression and Outlier
Detection, Wiley, New York
(cited 9/18/07)
Schölkopf, B., Smola, A. and Müller,
K. R. (1998) Nonlinear component analysis as a kernel eigenvalue
problem, Neural Computation, 10, 1299-1319 (cited 10/11/07 &
10/16/07)
Schölkopf, B. and Alex Smola, A. (2002) Learning with
Kernels, MIT Press (10/16/07)
Schwiegerling, J., Greivenkamp, J. E. and Miller, J. M.
(1995) Representation of videokeratoscopic height
data with Zernike polynomials, Journal of the Optical Society of America,
Series A, 12, 2105-2113 (cited 9/18/07)
Spellman, P. T.,
Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O., Botstein, D. and Futcher, B. (1998), “Comprehensive Identification of Cell
Cycle-regulated Genes of the Yeast Saccharomyces
cerevisiae by Microarray Hybridization”, Molecular
Biology of the Cell, 9, 3273-3297
(cited 8/23/07)
Staudte. R. G. and Sheather, S. J. (1990) Robust Estimation and Testing,
Wiley, New York
(cited 9/18/07)
Toh, K. A. (2007)
SDPT3 version 4.0 (beta) -- a MATLAB software for semidefinite-quadratic-linear programming http://www.math.nus.edu.sg/%7Emattohkc/sdpt3.html (cited 10/23/07)
Vapnik, V, N. (1982) Estimation of
dependences based on empirical data, Springer (Russian version, 1979)
(cited 10/17/07)
Vapnik, V. N. (1995) The nature of statistical learning theory,
Springer (cited 10/17/07)
Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with S,
Fourth Edition, Springer, N. Y., ISBN 0-387-95457-0 (cited 8/21/07)
Wand, M. P.
& Jones, M. C. (1994) Kernel
Smoothing, Chapman & Hall/CRC, ISBN: 0412552701 (cited 9/25/07)
Wang, H. and Marron, J. S. (2007) Object oriented data analysis: sets of
trees, Annals of Statistics, 35,
1849-1873 (cited 12/4/07)
Wichers, L., Lee, C., Costa, D., Watkinson, W. and Marron, J. S. (2007) A Functional Data Analysis Approach
for Evaluating Temporal Physiologic Responses to Particulate Matter, Technical
Report UNC/STOR/07/05, http://stat-or.unc.edu/webspace/webpage/Tech_rep/FDA_ratPMdata07_05.pdf (cited 10/30/07)
Zhao, X., Marron, J. S. & Wells, M. T. (2004) The Functional Data
View of Longitudinal Data, Statistica Sinica, 14, 789-808 (cited 8/23/07)
Zhang, L.,
(2006), "SVD movies and plots for Singular Value Decomposition and its
Visualization", talk available at http://www.unc.edu/~lszhang/research/network/SVDmovie (cited 9/13/07)
Course Information:
Instructor: J. S. Marron,
Professor
Email: marron@email.unc.edu (checked regularly)
Office: Smith 309
Phones:
Office: 919-962-2188
Home: 919-493-2844
FAX: 919-962-2188
Formal
Office Hours:
Tuesdays
11:00 – 12:00
Thursdays 2:00 – 3:00
Informal
Office Hours: When I am in my office (priority to those
with appointments), and by email appointment
Class Meetings:
T-Th 12:30 - 1:45,
Smith 107
Grading:
·
Based on “Participant Presentation”
·
Length to be decided, but probably 10 minutes
·
Perhaps last 10 to 20 minutes of each class meeting
·
Topic:
o Could be related work you
are doing (have done)
o Or you can find a paper to
present
o Or I can suggest a paper
to present
·
Would like to start next week:
Volunteers?