Home Page for course OR778
School of Operations Research and Industrial Engineering
Cornell Unversity







Links to Lectures (with summary of topics):
 
 

Lecture 12/7/01    Sam Steckley

Lecture 12/5/01   (Revisited Open Problems, new mathematics for heavy tails (with new graphics), solution for log-normal durations lead to long-range dependence, final "big picture" comments)

Lecture 12/3/01   (Large Variable Association, found way to make axes "commensurable" with respect to multiplicative rescaling, but are tail indices right?)

Lecture 11/28/01    {Actually given on 11/30/01}    (Used toy examples and real data to explore variations (e.g. copulas) on Large Variable Association.  Problem:  how to make axes "commensurable")

Lecture 11/26/01    {Actually given on 11/28/01}    (Asymptotic Independence, using SiZer analysis, saw hard to explain dependencies, introduced modification: "Large Variable Association")

Lecture 11/21/01    Jianghong Wang

Lecture 11/19/01    Stacey Tang

Lecture 11/14/01    Jörg Rothenbühler

Lecture 11/12/01    Krishanu Maulik

Lecture 10/31/01    (Cascaded On-Off Process: Quantitative Validation, autocorrelation, summary statistics, visual periodicities, quantiles)

Lecture 10/29/01    (Cascaded On-Off Process: Definition, Parameters Estimation, Visual Impression)

Lecture 10/24/01    (Asymptotic Independence, using SiZer analysis, relationship between Size, Time and "Rate" = Size/Time, Tail index estimation via slope of log-log CCDF, Long Range Dependence vs. ARIMA(1))

Lecture 10/22/01    (For new HTTP Response Size data: Studied Asymptotic Indepence, log-log CCDF Tail Index Estimation)

Lecture 10/17/01    (For Simple Model: Investigated "Independence" part of Poisson process starts.  Studied Flow Scatterplots, to investigate "asymptotic independence".  Introduced new HTTP Response Size data, did first version of Scatterplots.)

Lecture 10/15/01    (Zooming spectral analysis, revisited flow duration distributions, from "Residual Life Time Distribution" viewpoint)

Lecture 10/10/01    (In context of heavy tailed durations imply LRD, revisited simple model assumptions, considered modifications for start time process, Weibull process improvement not compelling, Cluster Poisson process gave useful improvement)

Lecture 10/3/01    (In context of heavy tailed durations imply LRD, quick overview of extreme value theory, revisited simple model assumptions, Downey's argument for log normal, "Important" open problem)

Lecture 10/1/01    (In context of heavy tailed durations imply LRD, Mice and Elephants graphic, for "one minute split" flows, introduced simple model, investigated modelling assumptions)

Lecture 9/26/01    (Mice and Elephants View, showing how heavy tailed durations imply LRD, time windows gave truncation - length biasing, constructed simulated versions, careful look at IP, TCP, UDP, ...)

Lecture 9/24/01    (Finished SiZer background, zooming SiZer analysis)

Lecture 9/19/01    (Introduction to Time Series Analysis)

Lecture 9/17/01    (Zooming autocorrelation analysis, Heading toward zooming SiZer analysis, 1st doing SiZer background)

Lecture 9/12/01    (Finished (?) study of heavy tails, began study of "Long Range Dependence", via correlation analysis (sensible?), in context of Heavy tailed durations imply LRD?  TCP connection zooming graphic)

Lecture 9/10/01    (Careful overview of Q-Q analysis, quantile matching, comparison with Complementary CDF analysis)

Lecture 9/5/01    (Detailed Q-Q for tail of Response Size Distributions, Pareto and log normal gave decent fit, how should we think about "heavy tails" in context of Heavy tailed durations imply LRD?)

Lecture 9/3/01    ("Big picture" of Internet traffic, Response Size Distributions: SiZer analysis, Q-Q plots, simulated envelope, suggest possible Pareto fits)
 
 

Other Links:
 

Open Problem List

Summary of References (with links)
 
 
 
 

Data Sets:
(kindly provided by Don Smith, David Ott, Felix Hernandez, and others, from the UNC Computer Science Distributed and Real-Time Systems Group)
 

Response Size Data:  734,814 HTTP Response Sizes (in bytes), gathered around 1998.  (in plain ASCII text format, each line is one response size)
 

Updated Response Size Data, gathered in April, 2000. These files contain only the responses that are large than 100,000 bytes  (in plain ASCII text format, each line is one HTTP response, 1st column is size in bytes, 2nd is starting time (sec), 3rd is finishing time (sec), 4th is time required for transmission).  Individual files are for 4 hour blocks, 3 times a day, one for each weekday:

Monday morning, 8:00AM-12:00noon: 20010423_800.raw
Monday afternoon, 1:00PM-5:00PM: 20010423_1300.raw
Monday evening, 7:30PM - 11:30PM:    20010423_1930.raw

Tuesday morning, 8:00AM-12:00noon: 20010424_800.raw
Tuesday afternoon, 1:00PM-5:00PM: 20010424_1300.raw
Tuesday evening, 7:30PM - 11:30PM:    20010424_1930.raw

Wednesday morning, 8:00AM-12:00noon: 20010425_800.raw
Wednesday afternoon, 1:00PM-5:00PM: 20010425_1300.raw
Wednesday evening, 7:30PM - 11:30PM:    20010425_1930.raw

Thursday morning, 8:00AM-12:00noon: 20010426_800.raw
Thursday afternoon, 1:00PM-5:00PM: 20010426_1300.raw
Thursday evening, 7:30PM - 11:30PM:     20010426_1930.raw

Friday morning, 8:00AM-12:00noon: 20010420_800.raw
Friday afternoon, 1:00PM-5:00PM: 20010420_1300.raw
Friday evening, 7:30PM - 11:30PM:    20010420_1930.raw

Saturday morning, 8:00AM-12:00noon: 20010421_800.raw
Saturday afternoon, 1:00PM-5:00PM: 20010421_1300.raw
Saturday evening, 7:30PM - 11:30PM:    20010421_1930.raw

Sunday morning, 8:00AM-12:00noon: 20010429_800.raw
Sunday afternoon, 1:00PM-5:00PM: 20010429_1300.raw
Sunday evening, 7:30PM - 11:30PM:    20010429_1930.raw

Summary Statistics (Excel Spreadsheet)

Quantiles (Excel Spreadsheet)
 
 
 
 




 
 

Background Material:
 


Statistical Analysis and Modelling
of Internet Traffic Data




















Course Meetings:

     Time:   Mon. - Wed. 8:40 - 9:55
     Room:  Rhodes 471
 
 

Course Web Site:

http://www.orie.cornell.edu/~marron/OR778NetworkData/OR778home.html

maybe easier to follow link from:

http://www.orie.cornell.edu/~marron/
 
 
 


Instructor:   J. S. (Steve) Marron
 

Office:   Rhodes 234
Office Hours:   Mon. 10 - 11,    Tuesday 11 - 12
 

Phone:   (607) 255-9147
Email:   marron@stat.unc.edu
 

Course Email List:  please add yourself,
by sending an email with "subscribe" as the subject,
to:  or778-fa01-l-request@orie.cornell.edu

                    (useful for announcements, such as "notes now posted")
 
 


Course Work / Grading
 

Based on a presentation
 

Presentations:

    -    can be either a paper by others (you choose, or I suggest)

    -    or your own work

    -    let's discuss soon
 
 
 


Course Goals:

    -    Explore Internet Traffic from several viewpoints

    -    Highlight interesting open problems

    -    Promote possible joint research

    -    Maximize understanding by all class members