Class Notes 9/12/01
Last Time:
- Improved analysis of tail of Response Size Distributions
- Pareto(1.2) gave acceptable (?) fit
- So did Pareto(1.5) ??
- log normal was (surprisingly ?) close but inadequate
- how should we think about "heavy tails"????
- in context of:
Interesting "news" report from UNC
Date: 31 Aug 2001 14:28:23 -0400
From: ITS Change <itschang@hapi.isis.unc.edu>
Newsgroups: unc.support
Subject: [support] MAJOR: Rate limits in place for certain
file-sharing applications
At approximately 2:30PM on August 28th, a rate limit policy was instituted on the campus Internet router, limiting traffic from two file-sharing applications (KaZaA/Morpheus and Gnutella) to T1 network capacity for inbound traffic and T1 capacity for outbound traffic. This action was taken to maintain the integrity of the campus network infrastructure and to ensure appropriate bandwidth for those applications that are critical to the education and research mission of the University.
Over the past week, we had noticed that the combined inbound and outbound traffic from the two aforementioned applications represented more than two to three times the traffic from all Web transactions across the campus Internet link. In the case of the South Campus residence halls, we had noticed that the link was nearly saturated with approximately 65% of that traffic representing KaZaA and Gnutella traffic. In those residence halls, a number of students had reported extreme difficulties accessing Web pages necessary for their course work during that period.
Given the need to ensure the availability of critical applications over the network, as well as the mounting costs associated with commercial Internet bandwidth (180 Mb/sec of commercial Internet bandwidth costs between $650,000 and $1,000,000 per year), ATN decided that it was necessary to impose a rate limit on these two applications (KaZaA and Gnutella). We believe it is much more desirable to limit the traffic related to these applications than it is to block it all together.
ATN, through both the Security group and ResNet, has been
educating the campus community on the issues associated with copyright
and file sharing applications and will continue to do so.
Aside on Response Size Distributions
Interesting graphic by Felix
Hernandez, UNC Computer Science
21 log(CCDF log) plots of response sizes:
- from 4 hour blocks
- taken at 3 different time periods
-
for 7 consecutive days
Notes:
- general shapes similar to before
- surprisingly consistent "kinks"?
- i.e. "tail index" changes in a systematic way
-
motivates "tail index curve" idea??
Heavy tails and other fields
Key question:
what range of data is of interest?
Insurance:
Prob. of disasters generally beyond range of data?
Internet: Care
most about regions where have data
Finance: ???
Why Care About Heavy Tails (for internet traffic)?
Current Folklore (for aggregated
data):
Heavy tailed durations
Long Range Dependence
Toy Graphics, Exponential Durations
Toy Graphics, Pareto (1.5) Durations
(caused by the “few elephants”, but mice are there, too)
- Mandelbrot (60's)
- Paxson and Floyd (1995)
- Feldman, Gilbert and Willinger (1998)
- Riedi and Willinger (1999)
Go here for reference details
Investigation II: Long Range Dependence?
Question 1:
Is it really there?
- Early conceptions: no
(renders classical queueing theory useless?)
-
Current thought: yes
-
Very recent work (Cleveland, et. al.): not important
-
Motivates a very careful look
Investigation II: Long Range Dependence? (cont.)
Time series 1: Aggregated point process data,
1 million Packet Arrival times (from 1998), over ~ 3 minutes
Simple analysis: time series of bin counts
(Caution: different view of data from above Response Sizes)
10,000 bins, ~100 obs’s per bin
Binwidth ~ 0.02 sec
Classical Time Series Dependence
Measure:
Autocorrelation
Correlation:
- Measure of “dependence” between variables
- 0 for independent
-
+1 (-1) for linear relationship with slope > (<) 0
Autocorrelation:
of time series
For “lag” ,
Autocorrelation as a Dependence Measure (cont.)
Is this sensible?
(L2, i.e. 2nd moment, based)
-
likely misleading for heavy tailed distributions
- but here heavy tails are "horizontal"
-
we are looking "vertically" at bin counts
-
Poisson - Gaussian marginals a useful approx'n ???
A quick first look:
Investigation II: Long Range Dependence? (cont.)
- View 1: Approximate as: WhiteNoise + AR(1).
-
nearly “unit root”
-
close to nonstationary random walk
Investigation II: Long Range Dependence? (cont.)
Autocorrelation Plot (cont.)
- View 2: Hurst parameter ~ 0.86
Periodogram based C.I. is:
(0.82,1.06),
Based on analysis
and graphics by Richard Smith
-
0.86 Long Range Dependent, “self similar”, …
-
Consistent with above “heavy tail” theory
Investigation II: Long Range Dependence? (cont.)
Recent controversy:
Cao, J., Cleveland, W. S.,
Lin, D. and Sun, D. X. (2001) The effect of statistical multiplexing on
internet packet traffic: theory and empirical study. Bell Labs Tech.
Report. Downloadable from here.
- study interarrival times, not bin counts
- fine scale structure is approx. Poisson process
- Long Range Dependence is there
- but only at larger time scales
- not important for queueing considerations
Investigation II: Long Range Dependence? (cont.)
Controversy motivates question:
How does
dependence (autocorr.) change across scales?
Approach:
Zooming Autocorrelation 1 (for same 1 mil. packets):
-
change binwidth:
-
so # of bins:
-
and obs’s / bin:
Investigation II: Long Range Dependence? (cont.)
-
smallest scale nearly uncorr’d (Cleveland)
-
Correlation “lifts vertically”???
-
gets to long range dependence (folklore)
-
for larger scale, sample noise dominates?
Investigation II: Long Range Dependence? (cont.)
Unexpected feature?
-
Dependence “lifts vertically”
-
Instead of “coming from right”
Time span
for lag
,
at scale
is
So “dependence at time scale ”,
as
increases, should appear at lag
.
Investigation II: Long Range Dependence? (cont.)
Notion of large lump on right (in autocorr.):
consistent with “periodicities”?.
Caution 1:
periodicities
large lump,
but not clear that
large lump
periodicity
Caution 2: TCP has its periodicites