Class Notes 10/24/01
Last Time:
For new HTTP Response Size
data:
-
Studied Asymptotic Indepence
-
Hill Estimation of tail indices
-
Used for "power renormalization"
-
Considered Box-Cox family of power transformations
-
Explored "ratio Hill estimation"
-
log-log CCDF Tail Index Estimation
New HTTP Response Data
Data sources: 4 hour blocks of packet headers
"Morning": 8:00-12:00
"Afternoon": 13:00-17:00
"Evening": 19:30-23:30
Gathered at UNC Main Link
During 7 days in April 2001
New Response Size Q-Q Plots
Another view of New Response Size Data:
Extreme Value Tail Index
Recall Intuition:
-
Shape parameter of Pareto (polynomial power)
- Strong relation to Long Range Dependence
- in Mice and Elephants plots (graphic)
-
in Duration Distributions,
implies Classical LRD in aggregated time series
- Strong relation to moments:
- for
have infinite mean
- for
have finite mean but infinite variance
- for
both mean and variance are finite
- similar for larger
and higher moments
New Response Size Q-Q Plots (cont.)
Simple, straightforward Estimation
of :
Slope of CCDF (i.e. 1 - CDF) on log - log scale
Log-log CCDF: graphic
- All 21 time blocks appear as thin blue lines
- Each Individual labeled and highlighted in thick red
- Not very "linear"?
- Suggests classical extreme value theory
hasn't "kicked in" yet???
- Note "shapes" of curves surprisingly constant
- Suggests curvature is not "random phenomenon"!
- Instead something systematic about internet traffic?
- Point worth deeper statistical confirmation??
- Suggests enhancement of current mathematics????
- Friday evening an "extreme point"? (least steep?)
- Many Resp. Sizes near 400 bytes???
(also for Friday, Afternoon, no where else?)
- Worth plotting data between 0.999 quantile and max???
(1,000 to 7,000 of these for each time block....)
New Response Size Q-Q Plots (cont.)
Now estimate "tail index" ,
by studying:
Slopes: graphic
- Simply use difference quotients from log-log CCDF
- Numerical problem: 0 denominators
- Reset to bottom of plot
- Suggest ignoring those
- Could use fancier differentiation (e.g. over bigger range)
- But this "raw data" shows interesting structure
-
"Almost always" have
(interesting for LRD)
-
But no apparent "tail limit" for ?
- So do not satisfy "classical heavy tail definition"?
- But still clearly "intuitively heavy tailed"?
-
Worth exploring alternate definitions?
New Mathematics for "Heavy Tails"?
Version 1:
For some ,
Open Problem 1: For the simple Model,
with Version 1 tailed Duration Dist'n, is
(i.e. have index
LRD)
Version 2: Reformulate, in terms of: have
Open Problem 2: For the simple model,
with Version 2 tailed Duration Dist'n,
can we still have
(in a suitable sense)?
How do we modify version
2 to make this happen?
A "really important" open problem (from before)
Can the lognormal lead to
Long Range Dependence?
Simplest (widely held) answer:
No, not for any fixed log-normal
Deeper question:
what if the lognormal changes during the asymptotics?
Who cares?
-
Common misperception about "infinite moments"
A precise question: for a sequence of "simple models",
with log
normal (parameters
and
) durations,
under
what conditions (on
and
) does
classical L.R.D.
(in any sense defined in Lecture9-19-01),
result, as ????
A "really important" open problem (from before)
Suggested approaches:
-
work with autocorrelation definition of L.R.D.
-
since straightforward to compute for simple process
-
mode of convergence (of autocorrelation function)?
- pointwise in lag?
(no, have uninteresting family of exact answers)
- uniform on compacts?
- over "sliding intervals", ?
General
Current Direction:
QQ analysis, based on new Felix response sizes, towards "shape index curve" idea
shape
index curve, towards LRD?
Back Burner, but interesting:
Another approach to why Weibull process is wrong: study what happens when try "memoryless property" for Weibull.
More
sophisticated study of Cluster distribution in clustered Poisson?
Future Topics:
"Asymptotic Independence"
scatterplots
Stationarity? (scale
based?) (paper from Richard)
Heavy tails lead to long
range dependence
Cleveland & Cao and
Ramanan asymptotics & mice vs. elephants
How "Poisson - Gaussian"?
New "big trace" analysis?
Conservative Cascades
Cascaded On-Off Model
Shared Fate Visualization
Functional Data Analysis
of Traces
"Speculations" from zoom
stat talk
Kulkarni's view
Ask for comments (Html
lectures, vs. Power-Point?)