Statistics 322

Object Oriented Data Analysis


Combined References



Collected Movies,    PDF files


Marron's Matlab Software


Class Meetings:

Tuesday, August 30    Organizational matters  -  What is OODA?  -  Visualization by Projection  -  Object Space & Feature Space  -  Curves as Data  -  Data Representation Issues  -  PCA visualization

Thursday, September 1    Matlab Software  -  Time Series of Curves  -  Chemometrics Data  -  Mortality Data

Tuesday, September 6   
Gene Cell Cycle Data  -  Microarrays and HDLSS visualization  -  DWD bias adjustment  -  NCI 60 Data

Thursday, September 8    Finish NCI 60 Data  -  Linear Algebra Review  -  Multivariate Probability Review  -  PCA as an optimization Problem  -  PCA Mathematics and Graphics

Tuesday, September 13   
PCA Redistribution of Energy  -  PCA Data Representation  -  Alternate PCA Computation   &  SVD  -  Primal - Dual PCA  -  Connections between discrete and continuous curve data

Thursday, September 15   
Finished Primal-Dual PCA  vs. SVD  -  PCA for Corpora Callosa  -  Fourier Boundary Representation  -  Medial Representation  -  Movies for Visualization

Tuesday, September 20   
Cornea Data  -  Robust HDLSS (Spherical) PCA

Thursday, September 22    Out of Town

Tuesday, September 27    Elliptical PCA  - 
Cluster & PCA  -  Revisit NCI60 Data  -  Mass Flux Data  -  SiZer

Thursday, September 29   
SiZer  -  Revisit Mass Flus Data  -  SiZer Analysis of Cell Cycle Data  -  Data Representation   

Tuesday, October 4
    Euclidean data, not near subspace  -  M-reps  -  Bladder Prostate Rectum  -  Data on manifolds  -  Mildly Non-Euclidean data - Trees as Data  -  Strongly Non-Euclidean Data

Thursday, October 6
    Participant Presentations: Lingsong Zhang, Travis Gaydos, Ja-Yeon Jeong, Marcel Prastawa

Tuesday, October 11
    Discrimination  -  Fisher Linear Discrimination (Nonparametric & Parametric)

Thursday, October 13
    Participant Presentations: Martin Styner, Isabelle Corouge, Joshua Stough, Surajit Ray
 

Tuesday, October 18
    Participant Presentations: Myung Hee Lee, Chihoon Lee, Brad Davis,  Peter Lorenzen

Thursday, October 20    Fall Break

Tuesday, October 25   
Class Cancelled

Thursday, October 27    Participant Presentations: Xuxin Liu, Sushant Rewaskar, Alok Shriram, Abhishek Singh

Tuesday, November 1    Sarang Joshi

Thursday, November 3    Participant Presentations: Josh Levy, Jeongyoun Ahn, Fernando Silva, Christine Xu

Tuesday, November 8   
Generalizations of FLD  -  HDLSS Discrimination  -  Maximal Data Piling

Thursday, November 10
    Participant Presentations:  Yufeng Liu, Jiancheng Jiang, Haipeng Shen, Christine Xu

Tuesday, November 15
    Participant Presentations: Hua Yang, Dan Samarov, Jie Zhou, Qiong Han

Thursday, November 17
    Class Cancelled

Tuesday, November 22
    Participant Presentations: Changwon Lin, Xin Fu, Luke Huan, Mihee Lee      

Thursday, November 24    Thanksgiving

Tuesday, November 29    Participant Presentations: Xuanyao He, Miao Xie, Fangfang Wang, Ipek Oguz

Thursday, December 1
   
Embedding and Kernel Spaces  -  Support Vector Machines  -  Distance Weighted Discrimination  -  Revisit micro-array data  -  Face Data

Tuesday, December 6    Participant Presentations: Vangelis Evangelou, Suman Sen, Ping Bai, Eli Broadhurst

Thursday, December 8   
Revisit NCI 60 data  -  HDLSS Hypothesis Testing: DiProPerm Test  -  HDLSS Geometric Representation  -   Independent Component Analysis - ICA for checking Gaussianity


Course Information:

Fall Semester, 2005
Class Meetings:  Tuesday-Thursday 12:30 - 1:45, Smith 107
Taught by:  J. S. Marron

Office:    Smith 309
Office Hours:   Thursday, 2:00 - 3:30
Email:     marron@email.unc.edu
Office Telephone:    (919) 962-2188    
Home Telephone:    (919) 493-2844

Class Email Listserv:

Course Description:

Object Oriented Data Analysis is the statistical analysis of populations of complex objects.  In the special case of Functional Data Analysis, these data objects are curves, where standard Euclidean approaches, such as principal components analysis, have been very successful.  Recent developments in medical image analysis motivate the statistical analysis of populations of more complex data objects which are elements of mildly non-Euclidean spaces, such as Lie Groups and Symmetric Spaces, or of strongly non-Euclidean spaces, such as spaces of tree-structured data objects.  These new contexts for Object Oriented Data Analysis create several potentially large new interfaces between mathematics and statistics.  Even in situations where Euclidean analysis makes sense, there are statistical challenges because of the High Dimension Low Sample Size problem, which motivates a new type of asymptotics leading to non-standard mathematical statistics.

Prerequisite is some type of course experience with notions of probability, expectation, variance, covariance, and the multivariate normal distribution, e.g. as in Stat 164 (but there are a number of other courses that will work as well).  Most fundamental statistical concepts that are needed (e.g. Principal Component Analysis) will be developed during the course.

Course grading will be done on the basis of student presentations.  The presentation will be either about the student's own related work (rather broadly defined), or else about a recent paper in the area.

Enrollment is encouraged, but auditors are also welcome.


Grading:

Class Discussion: