Funciones de densidad
Transcript of Funciones de densidad
![Page 1: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/1.jpg)
Copyright © Andrew W. Moore Slide 1
Probability Densities in Data
MiningAndrew W. Moore
ProfessorSchool of Computer ScienceCarnegie Mellon University
www.cs.cmu.edu/[email protected]
412-268-7599
![Page 2: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/2.jpg)
Copyright © Andrew W. Moore Slide 2
Contenido• Porque son importantes.• Notacion y fundamentos de PDF
continuas.• PDFs multivariadas continuas.• Combinando variables aleatorias
discretas y continuas.
![Page 3: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/3.jpg)
Copyright © Andrew W. Moore Slide 3
Porque son importantes?• Real Numbers occur in at least 50% of
database records• Can’t always quantize them• So need to understand how to describe
where they come from• A great way of saying what’s a
reasonable range of values• A great way of saying how multiple
attributes should reasonably co-occur
![Page 4: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/4.jpg)
Copyright © Andrew W. Moore Slide 4
Porque son importantes?• Can immediately get us Bayes
Classifiers that are sensible with real-valued data
• You’ll need to intimately understand PDFs in order to do kernel methods, clustering with Mixture Models, analysis of variance, time series and many other things
• Will introduce us to linear and non-linear regression
![Page 5: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/5.jpg)
Copyright © Andrew W. Moore Slide 5
A PDF of American Ages in 2000
![Page 6: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/6.jpg)
Copyright © Andrew W. Moore Slide 6
Poblacion de PR por grupo de edad
group freq midpoint freq.rela0-4 284593 2.5 0.07375165-9 301424 7.5 0.078113310-14 305025 12.5 0.079046515-19 305577 17.5 0.079189520-24 299362 22.5 0.077578925-29 277415 27.5 0.071891430-34 262959 32.5 0.068145235-39 265154 37.5 0.068714040-44 258211 42.5 0.066914745-49 239965 47.5 0.062186350-54 233597 52.5 0.060536155-59 206552 57.5 0.053527460-64 169796 62.5 0.044002265-69 141869 67.5 0.036765070-74 112416 72.5 0.029132375-79 85137 77.5 0.022063080-84 57953 82.5 0.015018485+ 51801 87.5 0.0134241
![Page 7: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/7.jpg)
Copyright © Andrew W. Moore Slide 7
0 20 40 60 80
0.02
0.04
0.06
0.08
pobpr$midpoint
pobp
r$fre
q.re
lapdf de la edad poblacional en PR en 2000
![Page 8: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/8.jpg)
Copyright © Andrew W. Moore Slide 8
A PDF of American Ages in 2000
Let X be a continuous random variable.If p(x) is a Probability Density Function for X then…
b
ax
dxxpbXaP )(
50
30age
age)age(50Age30 dpP
= 0.36
![Page 9: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/9.jpg)
Copyright © Andrew W. Moore Slide 9
Properties of PDFs
That means…
h
hxXhxPxp
22)( lim0h
b
ax
dxxpbXaP )(
)(xpxXPx
![Page 10: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/10.jpg)
Copyright © Andrew W. Moore Slide 10
)()()]2/(2/[)()22
(2/
2/whpwphxhxdttphxXhxP
hx
hx
Donde x-h/2<w<x+h/2). Luego,
)()2/2/( wph
hxXhxP
Asi p(w) tiende a p(x) cuando h tiende a cero
![Page 11: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/11.jpg)
Copyright © Andrew W. Moore Slide 11
x
dttpxXP )()(
h
dttp
hxXPhxXP
xdxXdP
hx
x
h
)()()(lim)(
)(0
hxwxwphwhp
h
),(lim)(lim
0
Notar que p(w) tiende a p(x) cuando h tiende a 0
Funcion de distribucion acumulativa. Esta es una funcion No decreciente
Se ha mostrado que la derivada de la funcion de distribucion da la funcion de densidad
![Page 12: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/12.jpg)
Copyright © Andrew W. Moore Slide 12
Properties of PDFs
b
ax
dxxpbXaP )(
)(xpxXPx
Therefore…
Therefore…
1)(
x
dxxp
0)(: xpxLa dcerivada de una fucnion no dcecreciente es mayor o igual que cero.
![Page 13: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/13.jpg)
Copyright © Andrew W. Moore Slide 13
• Cual es el significado de p(x)?Si
p(5.31) = 0.06 and p(5.92) = 0.03
Entonces cuando un valor de X es muestreado de la distribucion, es dos veces mas probable que X este mas cerca a 5.31 que a 5.92.
![Page 14: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/14.jpg)
Copyright © Andrew W. Moore Slide 14
Yet another way to view a PDF
A recipe for sampling a random age.
1. Generate a random dot from the rectangle surrounding the PDF curve. Call the dot (age,d)
2. If d < p(age) stop and return age
3. Else try again: go to Step 1.
![Page 15: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/15.jpg)
Copyright © Andrew W. Moore Slide 15
Test your understanding• True or False:
1)(: xpx
0)(: xXPx
![Page 16: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/16.jpg)
Copyright © Andrew W. Moore Slide 16
ExpectationsE[X] = the expected value of random variable X= the average value we’d see if we took a very large number of random samples of X
x
dxxpx )(
![Page 17: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/17.jpg)
Copyright © Andrew W. Moore Slide 17
ExpectationsE[X] = the expected value of random variable X= the average value we’d see if we took a very large number of random samples of X
x
dxxpx )(
= the first moment of the shape formed by the axes and the blue curve= the best value to choose if you must guess an unknown person’s age and you’ll be fined the square of your error
E[age]=35.897
![Page 18: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/18.jpg)
Copyright © Andrew W. Moore Slide 18
Expectation of a function=E[f(X)] = the expected value of f(x) where x is drawn from X’s distribution. = the average value we’d see if we took a very large number of random samples of f(X)
x
dxxpxf )()(
Note that in general:])[()]([ XEfxfE
64.1786]age[ 2 E
62.1288])age[( 2 E
![Page 19: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/19.jpg)
Copyright © Andrew W. Moore Slide 19
Variance2 = Var[X] = the expected squared difference between x and E[X]
x
dxxpx )()( 22
= amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally
02.498]age[Var
![Page 20: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/20.jpg)
Copyright © Andrew W. Moore Slide 20
Standard Deviation2 = Var[X] = the expected squared difference between x and E[X]
x
dxxpx )()( 22
= amount you’d expect to lose if you must guess an unknown person’s age and you’ll be fined the square of your error, and assuming you play optimally = Standard Deviation = “typical” deviation of X from its mean
02.498]age[Var
][Var X
32.22
222 )]([)( XEXE
![Page 21: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/21.jpg)
Copyright © Andrew W. Moore Slide 21
Estadisticas para PR• E(edad)=35.17• Var(edad)=501.16• Desv.Est(edad)=22.38
![Page 22: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/22.jpg)
Copyright © Andrew W. Moore Slide 22
Funciones de densidad mas conocidas
![Page 23: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/23.jpg)
Copyright © Andrew W. Moore Slide 23
Funciones de densidad mas conocidas
• La densidad uniforme o rectangular• La densidad triangular• La densidad exponencial• La densidad Gamma y la Chi-square• La densidad Beta• La densidad Normal o Gaussiana• Las densidades t y F.
![Page 24: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/24.jpg)
Copyright © Andrew W. Moore Slide 24
La distribucion rectangular
-w/2 0 w/2
1/w
0][ XE12
]Var[2wX
2w|x|if02w|x|if1
)( wxp
![Page 25: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/25.jpg)
Copyright © Andrew W. Moore Slide 25
La distribucion triangular
0
w|x|
w|x|w
xwxp
if0
if||)( 2
6]Var[
2wX
0][ XE
w
1w
w
![Page 26: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/26.jpg)
Copyright © Andrew W. Moore Slide 26
The Exponential distribution
otherwise
xexpx
if0
0if1)(
/
2]Var[ X
][XE
0 20 40 60 80 100
0.00
0.02
0.04
0.06
0.08
0.10
Densidad exponencial,=.1
x
0.1
* exp
(-0.1
* x)
![Page 27: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/27.jpg)
Copyright © Andrew W. Moore Slide 27
La distribucion Normal
Estandar
2exp
21)(
2xxp
1]Var[ X
0][ XE
![Page 28: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/28.jpg)
Copyright © Andrew W. Moore Slide 28
La distribucion Normal General
2
2
2)(exp
21)(
xxp
2]Var[ X
μXE ][
=100
=15
![Page 29: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/29.jpg)
Copyright © Andrew W. Moore Slide 29
General Gaussian
2
2
2)(exp
21)(
xxp
2]Var[ X
μXE ][
=100
=15
Shorthand: We say X ~ N(,2) to mean “X is distributed as a Gaussian with parameters and 2”.In the above figure, X ~ N(100,152)
Also known as the normal
distribution or Bell-shaped curve
![Page 30: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/30.jpg)
Copyright © Andrew W. Moore Slide 30
The Error FunctionAssume X ~ N(0,1)Define ERF(x) = P(X<x) = Cumulative Distribution of X
x
z
dzzpxERF )()(
x
z
dzz2
exp21 2
![Page 31: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/31.jpg)
Copyright © Andrew W. Moore Slide 31
Using The Error FunctionAssume X ~ N(,2)P(X<x| ,2) = )( 2
xERF
![Page 32: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/32.jpg)
Copyright © Andrew W. Moore Slide 32
The Central Limit Theorem• If (X1,X2, … Xn) are i.i.d. continuous
random variables• Then define
• As n-->infinity, p(z)--->Gaussian with mean E[Xi] and variance Var[Xi]
Somewhat of a justification for assuming Gaussian noise is common
n
iin x
nxxxfz
121
1),...,(
![Page 33: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/33.jpg)
Copyright © Andrew W. Moore Slide 33
Estimadores de funcion de densidad
• Histograms
• K-nearest neighbors:
• Kernel density estimators
kndkf
2)(ˆ x
nhkxf )(ˆ h ancho de clase
dk es la distancia hasta el k-esimo vecino
![Page 34: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/34.jpg)
Copyright © Andrew W. Moore Slide 34
Estimacion de funcion de densidad-histograma
x
Den
sity
0 2 4 6 8
0.00
0.05
0.10
0.15
0.20
0.25
> x=c( 7.3, 6.8, 7.1, 2.5, 7.9, 6.5, 4.2, 0.5, 5.6, 5.9)> hist(x,freq=F,main="Estimacion de funcion de densidad-histograma")> rug(x,col=2)
![Page 35: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/35.jpg)
Copyright © Andrew W. Moore Slide 35
Estimacion de densidad por knn en 20 pts con k=1,3,5,7
0 2 4 6 8
01
23
4
x
fest
0 2 4 6 8
0.1
0.3
x
fest
0 2 4 6 8
0.05
0.20
x
fest
0 2 4 6 80.
050.
150.
25
x
fest
![Page 36: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/36.jpg)
Copyright © Andrew W. Moore Slide 36
Estimación por kernels de una función de densidad univariada.
En el caso univariado, el estimador por kernels de la función de densidad f(x) se obtiene de la siguiente manera. Consideremos que x1,…xn es una variable aleatoria X con función de densidad f(x), definamos la función de distribución empirica por
el cual es un estimador de la función de distribución acumulada F(x) de X. Considerando que la función de densidad f(x) es la derivada de la función de distribución F y usando aproximación para derivada se tiene que
nxobsxFn
#)(
![Page 37: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/37.jpg)
Copyright © Andrew W. Moore Slide 37
donde h es un valor positivo cercano a cero. Lo anterior es equivalente a la proporción de puntos en el intervalo (x-h, x+h) dividido por 2h. La ecuación anterior puede ser escrita como:
donde la función peso K está definida por 0 si |z|>1 K(z)= 1/2 si |z| 1
hhxFhxFxf nn
2)()()(ˆ
)(1)(ˆ1
n
i
i
hxxK
nhxf
![Page 38: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/38.jpg)
Copyright © Andrew W. Moore Slide 38
Muestra: 6, 8, 9 12, 20, 25,18, 31hhhdepuntosenproporcionf 2/)15,15()15(ˆ
32/164/28/)19,11()15(ˆ depuntosenporporcionf
]02/1002/1000)[4*8/(1)15(ˆ f
![Page 39: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/39.jpg)
Copyright © Andrew W. Moore Slide 39
este es llamado el kernel uniforme y h es llamado el ancho de banda el cual es un parámetro de suavización que indica cuanto contribuye cada punto muestral al estimado en el punto x. En general, K y h deben satisfacer ciertas condiciones de regularidad, tales como:
K(z) debe ser acotado y absolutamente
integrable en (-,) Usualmente, pero no siempre, K(z)0 y
simétrico, luego cualquier función de densidad simétrica puede usarse como kernel.
1)( dzzK
0)(lim
nhn
![Page 40: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/40.jpg)
Copyright © Andrew W. Moore Slide 40
Eleccion del ancho de banda h2.006.1 snh
Donde n es el numero de datos y s la desviacion estandar de la muestra.
2.013 )(79.0 nQQh
![Page 41: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/41.jpg)
Copyright © Andrew W. Moore Slide 41
EL KERNEL GAUSSIANO
En este caso el kernel representa una función peso más suave donde todos los puntos contribuyen al estimado de f(x) en x. Es decir,
)21exp(
21)( 2zzK
n
i
hxx i
enh
xf1
2)(
21)(ˆ
2
2
![Page 42: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/42.jpg)
Copyright © Andrew W. Moore Slide 42
EL KERNEL TRIANGULAR K(z)=1- |z| para |z|<1, 0 en otro caso.
EL KERNEL "BIWEIGHT" 15/16(1-z2)2 para |z|<1K(z)= 0 en otro caso
![Page 43: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/43.jpg)
Copyright © Andrew W. Moore Slide 43
EL KERNEL EPANECHNIKOV para |z|< K(z)= 0 en otro caso
5)
51(
543 2z
![Page 44: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/44.jpg)
Copyright © Andrew W. Moore Slide 44
Estimacion de densidad en 20 pts usando kernel gaussiano con h=.5,”opt1”,”opt2”, 4
0 2 4 6 8
0.00
0.15
0.30
x
fest
0 2 4 6 8
0.02
0.08
0.14
x
fest
0 2 4 6 8
0.05
0.15
x
fest
0 2 4 6 80.
040.
07
x
fest
![Page 45: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/45.jpg)
Copyright © Andrew W. Moore Slide 45
Variables aleatorias bidimensionales
p(x,y) = probability density of random
variables (X,Y) at location (x,y)
![Page 46: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/46.jpg)
Copyright © Andrew W. Moore Slide 46
Estimadores de funcion de densidad bi-dimensionales
• Histogramas
• K-nearest neighbors:
• Kernel density estimators
knAkxf )(ˆ
nAkxf )(ˆ A area de la clase
Ak es el area ncluyendo hasta el k-esimo vecino
![Page 47: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/47.jpg)
Copyright © Andrew W. Moore Slide 47
Estimacion de kernel bivariado
)||||(1)(ˆ1
2
n
i
i
hK
nhf xtt
))]()'[((1)(ˆ 2/11
121
ii
n
iHK
hnhf xtxtt
Sean xi=(x,y) los valores observados y t=(t1,t2) un punto del plano donde se desea estimar la densidad conjunta
22
21
00h
hH
Si h1=h2=h
![Page 48: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/48.jpg)
Copyright © Andrew W. Moore Slide 48
• (a1, a2)H-12
2
2
22
21 ||||
21
ha
haa
aa
2
21
/100/1h
hH
donde
![Page 49: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/49.jpg)
Copyright © Andrew W. Moore Slide 49
Estimacion de densidad-Kernel Gaussiano bivariado
n
i
hty
htx
ehnh
f1
2)(
2)(
21 21)(ˆ
22
22
21
21
t
![Page 50: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/50.jpg)
Copyright © Andrew W. Moore Slide 50
10
20
30
40
2000
3000
4000
50000 e+00
1 e-05
2 e-05
3 e-05
densidad conjunta estimada por metodo kernel
f1= kde2d(autompg1$V1, autompg1$V5,n=100)persp(f1$x,f1$y,f1$z)
![Page 51: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/51.jpg)
Copyright © Andrew W. Moore Slide 51
mpg
wei
ght
10 20 30 40
1500
2500
3500
4500
grafica de contorno de la densidad estimada
contour(f1, levels=c(8e-6,2e-5, 2.8e-5),col=c(2,3,4), xlab="mpg",ylab="weight")
![Page 52: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/52.jpg)
Copyright © Andrew W. Moore Slide 52
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
![Page 53: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/53.jpg)
Copyright © Andrew W. Moore Slide 53
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
P( 20<mpg<30 and 2500<weight<3000) =
volumen under the 2-d surface within the red rectangle
![Page 54: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/54.jpg)
Copyright © Andrew W. Moore Slide 54
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
P( [(mpg-25)/10]2 + [(weight-3300)/1500]2
< 1 ) =
volumen under the 2-d surface within the red oval
![Page 55: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/55.jpg)
Copyright © Andrew W. Moore Slide 55
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
Take the special case of region R = “everywhere”.Remember that with probability 1, (X,Y) will be drawn from “somewhere”. So..
x y
dydxyxp 1),(
![Page 56: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/56.jpg)
Copyright © Andrew W. Moore Slide 56
In 2 dimensions
Let X,Y be a pair of continuous random variables, and let R be some region of (X,Y) space…
Ryx
dydxyxpRYXP),(
),()),((
20h
2222lim h
hyYhyhxXhxP
),( yxp
![Page 57: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/57.jpg)
Copyright © Andrew W. Moore Slide 57
In m dimensions
Let (X1,X2,…Xm) be an n-tuple of continuous random variables, and let R be some region of Rm …
)),...,,(( 21 RXXXP m
Rxxx
mm
m
dxdxdxxxxp),...,,(
1221
21
,,...,),...,,(...
![Page 58: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/58.jpg)
Copyright © Andrew W. Moore Slide 58
Independence
If X and Y are independent then knowing the value of X does not help predict the
value of Y
)()(),( :yx, iff ypxpyxpYX
mpg,weight NOT independent
![Page 59: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/59.jpg)
Copyright © Andrew W. Moore Slide 59
Independence
If X and Y are independent then knowing the value of X does not help predict the
value of Y
)()(),( :yx, iff ypxpyxpYX
the contours say that acceleration and weight
are independent
![Page 60: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/60.jpg)
Copyright © Andrew W. Moore Slide 60
Multivariate ExpectationxxxXμX dpE )(][
E[mpg,weight] =(24.5,2600)
The centroid of the cloud
![Page 61: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/61.jpg)
Copyright © Andrew W. Moore Slide 61
Multivariate Expectation> f1= kde2d(autompg1$mpg, autompg1$weight,n=100)> dx=f1$x[2]-f1$x[1]> dy=f1$y[2]-f1$y[1]> dx[1] 0.379798> dy[1] 35.62626> meanmpg=sum(f1$x*f1$z)*dx*dy[1] 22.48855> meanweight=sum(f1$y*f1$z)*dx*dy[1] 2848.638>#estimated mean> mean(autompg1$weight)[1] 2977.584> mean(autompg1$mpg)[1] 23.44592
![Page 62: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/62.jpg)
Copyright © Andrew W. Moore Slide 62
Multivariate ExpectationxxxX dpffE )()()]([
![Page 63: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/63.jpg)
Copyright © Andrew W. Moore Slide 63
Test your understanding? ][][][ does ever) (if When :Question YEXEYXE
•All the time? Siempre•Only when X and Y are independent?•It can fail even if X and Y are independent?
![Page 64: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/64.jpg)
Copyright © Andrew W. Moore Slide 64
Bivariate Expectation
dydxyxpxXE ),(][
dydxyxpyxfyxfE ),(),()],([
dydxyxpyYE ),(][
dydxyxpyxYXE ),()(][
][][][ YEXEYXE
![Page 65: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/65.jpg)
Copyright © Andrew W. Moore Slide 65
Bivariate Covariance)])([(],Cov[ yxxy YXEYX
])[(][],Cov[ 22xxxx XEXVarXX
])[(][],Cov[ 22yyyy YEYVarYY
![Page 66: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/66.jpg)
Copyright © Andrew W. Moore Slide 66
Bivariate Covariance)])([(],Cov[ yxxy YXEYX
])[(][],Cov[ 22xxxx XEXVarXX
])[(][],Cov[ 22yyyy YEYVarYY
then, Write
YX
X
yxy
xyxTxx ))((E 2
2
][] [
ΣμXμXXCov
![Page 67: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/67.jpg)
Copyright © Andrew W. Moore Slide 67
Covarianza y desviacion estandar estimadas entre mpg y weight
> cov(autompg1[,c(1,5)]) mpg weightmpg 60.91814 -5517.441weight -5517.44070 721484.709> sd(autompg1$mpg)[1] 7.805007> sd(autompg1$weight)[1] 849.4026
![Page 68: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/68.jpg)
Copyright © Andrew W. Moore Slide 68
Covariance Intuition
E[mpg,weight] =(24.5,2600)
8mpg 8mpg
700weight
700weight
![Page 69: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/69.jpg)
Copyright © Andrew W. Moore Slide 69
Covariance Intuition
E[mpg,weight] =(24.5,2600)
8mpg 8mpg
700weight
700weight
PrincipalEigenvectorof
![Page 70: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/70.jpg)
Copyright © Andrew W. Moore Slide 70
Regression Line
)()/( 2 xX
xyy xxXYE
Notice that the regression line pass trough (x,y)
![Page 71: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/71.jpg)
Copyright © Andrew W. Moore Slide 71
Regression Line>l1=lm(weight~mpg,data=autompg1)> l1
Call:lm(formula = weight ~ mpg, data = autompg1)
Coefficients:(Intercept) mpg 5101.11 -90.57
>#slope of regression line>slope= -5517.44/60.918[1] -90.571
![Page 72: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/72.jpg)
Copyright © Andrew W. Moore Slide 72
Primer Principal component> a=cov(autompg1[,c(1,5)])> eigen(a)$values[1] 721526.90386 18.72329
$vectors [,1] [,2][1,] -0.007647317 0.999970759[2,] 0.999970759 0.007647317
#slope of primer principal component
> .99997/-.00764[1] –130.8861
![Page 73: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/73.jpg)
Copyright © Andrew W. Moore Slide 73
Covariance Fun Facts
yxy
xyxTxx ))((E 2
2
][] [
ΣμXμXXCov
•True or False: If xy = 0 then X and Y are independent. False•True or False: If X and Y are independent then xy = 0. True•True or False: If xy = x y then X and Y are deterministically related. True•True or False: If X and Y are deterministically related then xy = x y. false
How could you prove or disprove these?
![Page 74: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/74.jpg)
Copyright © Andrew W. Moore Slide 74
Test your understanding? ][][][ does ever) (if When :Question YVarXVarYXVar
•All the time?•Only when X and Y are independent? Cierto•It can fail even if X and Y are independent?
![Page 75: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/75.jpg)
Copyright © Andrew W. Moore Slide 75
Marginal Distributions
y
dyyxpxp ),()(
![Page 76: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/76.jpg)
Copyright © Andrew W. Moore Slide 76
Conditional Distributions
yYXyxp
when of p.d.f.)|(
)4600weight|mpg( p
)3200weight|mpg( p
)2000weight|mpg( p
![Page 77: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/77.jpg)
Copyright © Andrew W. Moore Slide 77
Conditional Distributions
yYXyxp
when of p.d.f.)|(
)4600weight|mpg( p
)(),()|(
ypyxpyxp
Why?
![Page 78: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/78.jpg)
Copyright © Andrew W. Moore Slide 78
Independence Revisited
It’s easy to prove that these statements are equivalent…
)()(),( :yx, iff ypxpyxpYX
)()|( :yx,
)()|( :yx,
)()(),( :yx,
ypxyp
xpyxp
ypxpyxp
![Page 79: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/79.jpg)
Copyright © Andrew W. Moore Slide 79
More useful stuff
BayesRule
(These can all be proved from definitions on previous slides)
1)|(
x
dxyxp
)|()|,(),|(
zypzyxpzyxp
)()()|()|(
ypxpxypyxp
![Page 80: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/80.jpg)
Copyright © Andrew W. Moore Slide 80
Mixing discrete and continuous variables
h
vAhxXhxPvAxp
22),( lim0h
1),(1
An
v x
dxvAxp
BayesRule
BayesRule)(
)()|()|(AP
xpxAPAxp
)()()|()|(
xpAPAxpxAP
![Page 81: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/81.jpg)
Copyright © Andrew W. Moore Slide 81
0 5 10 15
0.00
0.05
0.10
0.15
0.20
0.25
x
fest
clase 1clase 2
Estimacion de funcion de dendidad conjunta mixta
P(educacion,salario>50k)
![Page 82: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/82.jpg)
Copyright © Andrew W. Moore Slide 82
5 10 15
0.4
0.6
0.8
1.0
1:16
b[, 1
]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
class 1
class 2
conditional density estimation de educacion por clase
Estimation of the posterior P(Class/Education)
![Page 83: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/83.jpg)
Copyright © Andrew W. Moore Slide 83
Mixing discrete and continuous variables
P(EduYears,Wealthy)
![Page 84: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/84.jpg)
Copyright © Andrew W. Moore Slide 84
Mixing discrete and continuous variables
P(EduYears,Wealthy)
P(Wealthy| EduYears)
![Page 85: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/85.jpg)
Copyright © Andrew W. Moore Slide 85
Mixing discrete and continuous variables
Reno
rmal
ized
Axes
P(EduYears,Wealthy)
P(Wealthy| EduYears)
P(EduYears|Wealthy)
![Page 86: Funciones de densidad](https://reader034.fdocuments.co/reader034/viewer/2022051302/586f74861a28abd2158c0505/html5/thumbnails/86.jpg)
Copyright © Andrew W. Moore Slide 86
Ejercicios• Suppose X and Y are independent real-
valued random variables distributed between 0 and 1:• What is p[min(X,Y)]? • What is E[min(X,Y)]?
• Prove that E[X] is the value u that minimizes E[(X-u)2]
• What is the value u that minimizes E[|X-u|]?