SlideShare a Scribd company logo
1 of 81
Download to read offline
1	
AlphaGo	in	Depth		
by	Mark	Chang	
1
Overview	
•  AI	in	Game	Playing	
•  Machine	Learning	and	Deep	Learning	
•  Reinforcement	Learning		
•  AlphaGo's	Methods		
2
AI	in	Game	Playing	
•  AI	in	Game	Playing	
– Adversarial	Search	–	MiniMax		
– Monte	Carlo	Tree	Search	
– MulI-Armed	Bandit	Problem	
3
Game	Playing	
4
Adversarial	Search	–	MiniMax		
-1	 0	
1	 0	
0	 0	 -1	 -1	 -1	 -1	
1	 1	
5
Adversarial	Search	–	MiniMax		
-1	 0	
1	 0	
0	 0	 -1	 -1	 -1	 -1	
1	 1	
-1	 0	 -1	 -1	
1	 1	0	 -1	
1	-1	
6
Monte	Carlo	Tree	Search	
…….	
…….	
…….	
…….	
…….	
…….	
…….	
…….	
Tree	search	
Monte	Carlo	search	
…….	
7
Monte	Carlo	Tree	Search	
•  Tree	Search	+	Monte	Carlo	Method		
– SelecIon	
– Expansion	
– SimulaIon	
– Back-PropagaIon	
3/5	
1/2	2/3	
0/1	
white	wins	/	total	
1/2	1/1	 1/1	
1/1	 0/1	
8
SelecIon	
3/5	
1/2	2/3	
0/1	1/2	1/1	 1/1	
1/1	 0/1	
9
Expansion	
3/5	
1/2	2/3	
0/1	1/2	1/1	 1/1	
1/1	 0/1	
0/0	
10
SimulaIon	
0/0	
…….	
3/5	
1/2	2/3	
0/1	1/2	1/1	 1/1	
1/1	 0/1	
11
Back-PropagaIon	
4/6	
1/2	3/4	
0/1	2/3	1/1	 1/1	
2/2	 0/1	
1/1	
12
MulI-Armed	Bandit	Problem	
•  ExploraIon	v.s.	ExploitaIon	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
13
UCB1	algorithm		
argmax
i
( ¯xi +
r
2logn
ni
)
•  				:	The	mean	payout	for	machine	
•  				:	The	number	of	plays	of	machine	
•  				:	The	total	number	of	plays	
¯xi +
r
2logn
ni¯xi +
r
2logn
ni
i
i
n
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
6	
7	
8	
F	
G	
H	
F	
G	
H	
F	
G	
H	
6	
7	
8	
6	
7	
8	
6	
7	
8	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
F	
G	
H	
14
UCB1	algorithm		
R(s, a) = Q(s, a) + c
s
logN(s)
N(s, a) 3/5	
1/2	2/3	
0/1	1/2	1/1	 1/1	
1/1	 0/1	
a⇤
= argmax
a
R(s, a)
3/5	
2/3	 N(s, a) = 3
N(s) = 5
c = constant
Q(s, a) = 2/3
15
UCB1	algorithm		
3/5	
1/2	2/3	
0/1	1/2	1/1	 1/1	
1/1	 0/1	
R(s, a1) = 2/3 + 0.5
r
log5
3
= 1.3029
R(s, a2) = 1/2 + 0.5
r
log5
2
= 0.9485
s
a1 a2
R(s, a1) > R(s, a2)
a⇤
= argmax
a
R(s, a) = a1
16
Machine	Learning	and	Deep	Learning	
•  Machine	Learning	and	Deep	Learning	:	
– Supervised	Machine	Learning		
– Neural	Networks		
– ConvoluIonal	Neural	Networks	
– Training	Neural	Networks	
17
Supervised	Machine	Learning		
Machine		
Learning	
Model	
Problem:	 SoluIon:	Output:	
Problem:	 Output:	
Machine		
Learning	
Model	
Feedback	
18
Supervised	Machine	Learning		
ClassificaIon	 Regression	
Class	A	 Class	B	
Score	=	1	
Score	=	?	
Score	=	3		
19	
Class	?
Neural	Networks	
n
W1
W2
x1
x2
b
 Wb
nin = w1x1 + w2x2 + wb
Sigmoid:	
nout =
1
1 + e nin
nout
1 e 2nin
1 + e 2nin
tanh:	
ReLU:	⇢
nin if nin > 0
0 otherwise
20
Neural	Networks	
•  AND	Gate	
x1
 x2
 y
0
 0
 0
0
 1
 0
1
 0
 0
1
 1
 1
(0,0)
(0,1)
 (1,1)
(1,0)
0
1
n
b
-30
y
x1	
x2	
20
20
y =
1
1 + e (20x1+20x2 30)
20x1 + 20x2 30 = 0
21
Neural	Networks	
x
y
n11
n12
n21
n22
b b
z1
z2
W12,y
W12,x
W11,y
W11,b
W12,b
W11,x
 W21,11
W22,12
W21,12
W22,11
W21,b
W22,b
Input		
layer
Hidden	
layer
Output	
layer
22
Neural	Networks	
•  XOR	Gate	
n
-20
20
b
-10
y
(0,0)
(0,1)
 (1,1)
(1,0)
0
1
(0,0)
(0,1)
 (1,1)
(1,0)
1
0
(0,0)
(0,1)
 (1,1)
(1,0)
0
0
1
n1
b
-30
20
20
x1	
x2	
n2
b
-10
20
20
x1	
x2	
x1
 x2
 n1
 n2
 y
0
 0
 0
 0
 0
0
 1
 0
 1
 1
1
 0
 0
 1
 1
1
 1
 1
 1
 0
23
MulI-Class	ClassificaIon	
•  SogMax		
n1
n2
n3
n1,out =
en1,in
en1,in + en2,in + en3,in
n2,out =
en2,in
en1,in + en2,in + en3,in
n3,out =
en3,in
en1,in + en2,in + en3,in
n1,in
n2,in
n3,in
24
MulI-Class	ClassificaIon	
•  SogMax		
n1,out =
en1,in
en1,in + en2,in + en3,in
n1
n2
n3
n1,in
n2,in
n3,in
25	
n1,in n2,in and
n1,in n3,in
n1,in ⌧ n2,in or
n1,in ⌧ n3,in
MulI-Class	ClassificaIon	
•  One-Hot	Encoding:	
Class	1	 Class	2	 Class	3	
1
0
0
0
1
0
0
0
1
n1
n2
n3
26
ConvoluIonal	Neural	Neworks	
depth	
width	width		depth	
weights	
height	
shared	weights	
27
ConvoluIonal	Neural	Neworks	
ConvoluIonal	
layer	
RecepIve	fields	 RecepIve	fields	
Input	
layer	
ConvoluIonal	
layer	
…….	…….	
28
ConvoluIonal	Neural	Neworks	
Input	layer	
Filter	responses	Input	image	
Filters	in	convoluIonal	layers	
………..	
29
Training	Neural	Networks	
•  One-Hot	Encoding:	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 1	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 1	 0	 0	
0	 0	 0	 1	 0	
0	 0	 1	 0	 0	
0	 0	 0	 0	 0	
1	 1	 1	 1	 1	
1	 1	 0	 1	 1	
1	 1	 0	 0	 1	
1	 1	 0	 1	 1	
1	 1	 1	 1	 1	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 1	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
Player’s		
stones	
Opponent’s		
stones	
Empty	
posiIons	
Next	
posiIon	
Input	 Output	
30
Training	Neural	Networks	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 1	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 1	 0	 0	
0	 0	 0	 1	 0	
0	 0	 1	 0	 0	
0	 0	 0	 0	 0	
1	 1	 1	 1	 1	
1	 1	 0	 1	 1	
1	 1	 0	 0	 1	
1	 1	 0	 1	 1	
1	 1	 1	 1	 1	
0	 0	 0	 0	 0	
0	 .5	 0	 0	 0	
0	 .3	 0	 0	 0	
0	 .2	 0	 0	 0	
0	 0	 0	 0	 0	
Forward	propagaIon	
pw(a|s)
Inputs:		
Input		
layer	
ConvoluIonal	
layer	
Output	
layer	
Outputs:	
s
31
Training	Neural	Networks	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 1	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 1	 0	 0	
0	 0	 0	 1	 0	
0	 0	 1	 0	 0	
0	 0	 0	 0	 0	
1	 1	 1	 1	 1	
1	 1	 0	 1	 1	
1	 1	 0	 0	 1	
1	 1	 0	 1	 1	
1	 1	 1	 1	 1	
0	 0	 0	 0	 0	
0	 .5	 0	 0	 0	
0	 .3	 0	 0	 0	
0	 .2	 0	 0	 0	
0	 0	 0	 0	 0	
Inputs:		
Input		
layer	
ConvoluIonal	
layer	
Output	
layer	
Outputs:	
s
pw(a|s)
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
0	 1	 0	 0	 0	
0	 0	 0	 0	 0	
0	 0	 0	 0	 0	
Golden:	
Backward	propagaIon	
Cost	funcIon:	
ai
w = w ⌘
@ log(pw(ai|s))
@w
32	
log(pw(ai|s))
Cost	FuncIon	
w w
pw(ai|s) ⇡ 0
pw(ai|s) ⇡ 1
log(pw(ai|s))
33
Gradient	Descent	
w w
	Learning	Rate		
w = w ⌘
@ log(pw(ai|s))
@w
@ log(pw(ai|s))
@w
34
Gradient	Descent	
pw(ai|s)
s
w w ⌘
@log(pw(ai|s))
@w
35	
ai = 1Golden						is	1	
ai = 1Golden						is	0
Gradient	Descent	
36
Backward	PropagaIon	
n2
 n1
J
Cost		
funcIon:	
n2(out) n2(in) w21
37	
@J
@w21
=
@J
@n2(out)
@n2(out)
@n2(in)
@n2(in)
@w21
w21 w21 ⌘
@J
@w21
w21 w21 ⌘
@J
@n2(out)
@n2(out)
@n2(in)
@n2(in)
@w21
Reinforcement	Learning		
•  Reinforcement	Learning	:		
– Policy	&	Value		
– Policy	Gradient	Method	
38
Reinforcement	Learning		
Reward	(Feedback)	
Reward	(Feedback)	
white	
win	
black	
win	
39
Reinforcement	Learning		
State:	 St
Reward	
(Feedback):	Rt
AcIon:	At
•  Feedback	is	delayed.	
•  No	supervisor,	only	a	reward	signal.	
•  Rules	of	the	game	are	unknown.	
•  Agent’s	acIons	affect	the	subsequent	state	
Agent	
Environment	
40
•  The	behavior	of	an	agent	
	
	
Policy	
sstate	
a1acIon	 a2acIon	
⇡(a2 | s)
= 0.5
⇡(a1 | s)
= 0.5
StochasIc	Policy	
sstate	
DeterminisIc	Policy	
⇡(s) = a
acIon	a
41
Value	
•  The	expected	long-term	reward	
sstate	 q⇡(s, a)v⇡(s)
acIon	a⇡policy	
State-value	FuncIon		 AcIon-value	FuncIon		
rreward:	
⇡policy	
sstate	
rreward:	
⇡policy	
end		 end		
42
Policy	Gradient	Method	
•  REINFOCE	
– the	REward	Increment	=	NonnegaIve	Factor	Offset	
ReinforCEment	
	
weights	in		
policy	funcIon		
reward	 baseline	
(usually	=	0)	
learning	
rate	
43	
w w + ↵(r b)
@log⇡(a|s)
@w
Grid	World	Example	
4	x	4	Grid	World	
Terminal	
Reward	=1	
Terminal	
Reward	=	-1	
IniIal	
posiIon	
AcIon	
Agent	
44
Policy	Networks	
0	 0	 0	 0	
0	 1	 0	 0	
0	 0	 0	 0	
0	 0	 0	 0	
One-hot	
encoding	 ProbabiliIes	of	acIons	
Sampling	
Execute	
acIon	
Neural	
Networks	
s ⇡(a|s)
45
IniIalizaIon	
Neural	
Networks	
IniIalized	with	
random	weights	
⇡(a|s) for	every		s
46
Forward	PropagaIon	
Neural	
Networks	
⇡(a|s)
s
47
Forward	PropagaIon	
Neural	
Networks	
⇡(a|s)
s
Reward	r = 1
48
Backward	PropagaIon	
Neural	
Networks	
Reward	r = 1
49	
w w + ↵(r b)
@log⇡(a|s)
@w
r
@log⇡(a|s)
@w
Backward	PropagaIon	
Neural	
Networks	
Reward	r = 1
50	
w w + ↵(r b)
@log⇡(a|s)
@w
r
@log⇡(a|s)
@w
Next	IteraIon	
Neural	
Networks	
51
Forward	PropagaIon	
Neural	
Networks	
⇡(a|s)
s
52
Forward	PropagaIon	
Neural	
Networks	
⇡(a|s)
s
Reward		r = 1
53
Backward	PropagaIon	
Neural	
Networks	
Reward		r = 1
54	
w w + ↵(r b)
@log⇡(a|s)
@w
r
@log⇡(a|s)
@w
Backward	PropagaIon	
Neural	
Networks	
Reward		r = 1
55	
w w + ↵(r b)
@log⇡(a|s)
@w
r
@log⇡(a|s)
@w
Next	IteraIon	…		
Neural	
Networks	
56
Ager	Several	IteraIons	…	
57
AlphaGo’s	Methods	
•  Training:	
– Supervised	learning	:	ClassificaIon	
– Reinforcement	learning	
– Supervised	learning	:	Regression	
•  Searching:		
– Searching	with	policy	and	value	networks	
– Distributed	search		
58
Training	
Human	expert	data	 Self-play	data	
Rollout	
policy	
SL	policy	
network	
RL	policy	
network	
Value	
network	
ClassificaIon	 Regression	
Policy	
gradient	
Generate	
data	
IniIalize	
weights	
p⇡ p p⇢ v✓
59
Supervised	Learning	:	ClassificaIon	
Human	expert	data	
Rollout	
policy	
SL	policy	
network	
ClassificaIon	
p⇡ p
KGS	dataset			
160,000	games	
29.4	million	posiIons		
linear-sogmax	network		
(faster	but	less	accurate)	
13-layers	convoluIonal	
neural	network	
50	GPUs,	3	weeks	
Accuracy	:	57.0%		
60
Input/Output	Data	
0	 0	 0	
0	 1	 0	
0	 0	 0	
0	 1	 0	
0	 0	 1	
0	 1	 0	
1	 0	 1	
1	 0	 0	
1	 0	 1	
Input	
Next	
posiIon	
Output	
0	 0	 0	
1	 0	 0	
0	 0	 0	
Stone	color:	
3	planes	
player,	opponent,	empty	
Liberty:	
8	planes	
1~8	liberIes	
Stone	color,	Liberty,	Turns	
since,	Capture	size,	
Self-atari	size,	Ladder	capture,	
Ladder	escape,	Sensibleness.	
Total:	48	planes	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 1	 0	
0	 0	 0	
0	 1	 0	
0	 0	 1	
0	 1	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
0	 0	 0	
61
Symmetries	
62	
Input	
RotaIon		
90	degrees	
RotaIon		
180	degrees	
RotaIon		
270	degrees	
VerIcal	
reflecIon	
VerIcal	
reflecIon	
VerIcal	
reflecIon	
VerIcal	
reflecIon
SL	Policy	Network	
Input	
Size:	19x19	
48	planes	
First	layer	
Conv+ReLU	
Kernel	size:	5x5	
k	filters		
2nd	to	12th	layers	
Conv+ReLU	
Kernel	size:	3x3	
k	filters		
13th	layer	
Kernel	size:	1x1	
1	filters		
Sogmax		
k	=	192	
63
Supervised	Learning	:	ClassificaIon		
Input:	
Golden	
acIon:	
Backward		
propagaIon	
p
SL	Policy		
network:	
s ap (a|s)
ProbabiliIes	
of	acIons:	
Learning	rate	
⌘
@ logp (a|s)
@
64
Reinforcement	Learning		
Self-play	data	
RL	policy	
network	
Policy	
gradient	
p⇢
50	GPUs,	1	day	
won	80%	SL	network	
10,000	x	128	games	
Weights		iniIalized	
by	SL	policy	network	
65
Reinforcement	Learning		
p⇢
SL	policy	
network	
p
p⇢
RL	policy	network	
Opponent	
IniIalize	
Weights	
⇢ = ⇢ =
play	 End	
Policy	
Gradient	
method	
p⇢
Opponent	pool	
Add							to	
opponent	pool	
p⇢
r
reward	
66
Policy	Gradient	Method	
p⇢ p⇢ p⇢ p⇢
p⇢(a1|s1) p⇢(a2|s2) p⇢(aT |sT )
s1 s2 sT
r(sT )
…….	
…….	
Reward	Backward	propagaIon	
Learning	rate	 Baseline	
67	
⇢ ! ⇢ + ↵
TX
i=1
@logp⇢(ai|si)
@⇢
(r(sT ) b(st))
Supervised	learning	:	Regression	
Self-play	data	
RL	policy	
network	
Value	
network	
Regression	
Generate	
data	
p⇢
v✓
30	million	posiIons	
50	GPUs,	1	week	
MSE:	0.226		
iniIalize	
weights	
15-layers	convoluIonal	
neural	network	
68
Value	Network	
Input	
Size:	19x19	
48	planes	
14th	layer	
Fully-connected	
256	ReLU	unit	
1st	~	13th		
The	same	as	
policy	networks	
15th	layer	
Fully-connected	
1	tanh	unit	
+1	unit		
(current	color)	
69
Input/Output	Data	
…….	
Played	by	RL	policy	
						network	 p⇢
Played	by	SL	policy	
							network	p
Randomly	sample	an	integer	U	in	1~450	1	
2	
t	=	1	 t	=	U-1	 t	=	U+1	 end	
Random	
acIon	
t	=	U	
3	 Generate	Training	Example:		
sU+1state:	 rreward:	
value:	zU+1
(sU+1, zU+1)
…….	
70
Supervised	learning	:	Regression	
Input:	
Golden	
value:	
Backward	propagaIon	
Value	
network:	
s v✓
v✓(s)
Output	
value:	
z
✓ ✓ + ⌘(z v✓(s))
@v✓(s)
@✓
71
Searching	
SelecIon	 Expansion	 EvaluaIon	 Backup	
Q + u
max	
r
v✓
Q + u
update	
p
p⇡
72
•  Each	edge	stores	a	set	of	staIsIcs		
•  														:	combined	mean	acIon	value		
•  														:	prior	probability	evaluated	by			
•  																:	esImated	acIon	value	by	
•  																:	esImated	acIon	value	by	
•  																:	counts	of	evaluaIons	by	
•  																:	counts	of	evaluaIons	by		
Searching	
P(s, a)
Nv(s, a)
Nr(s, a)
Wr(s, a)
Wv(s, a)
Q(s, a)
s
a
v✓(s)
p⇡(a|s)
v✓(s)
p⇡(a|s)
p (a|s)
73
SelecIon	
a⇤
= argmax
a
(Q(s, a) + u(s, a))
u(s, a) = cP(s, a)
pP
b Nr(s, b)
1 + Nr(s, a)
Choose	acIon		
a⇤
PUCT	Algorithm:	
ExploitaIon	 ExploraIon	
Visit	counts	
of	parent	node		s
Visit	counts	
of	edge												(s, a)
Level	of	exploraIon	
s
74
Expansion	
s
a
s0
Insert	the	node	for	the	successor	
state					.		s0
1	
2	
Nv(s0
, a0
) = Nr(s0
, a0
) = 0
Wr(s0
, a0
) = Wv(s0
, a0
) = 0
P(s0
, a0
) = p (a0
|s0
)
p (a0
|s0
)
If	visit	count	exceed	a	threshold	:	
						,	Nr(s, a) > nthr
a0
a0
For	every	possible						,	iniIalize	
the	staIsIcs:				
a0
75
EvaluaIon	
p⇡
1	
2	 Simulate	the	acIon	by		
rollout	policy	network								.	p⇡
Evaluate														by	value	network						.	v✓(s0
) v✓
r(sT )
v✓(s0
)
When	reaching	terminal							,		
calculate	the	reward													.		
sT
r(sT )
76
Backup	
s
a
r(sT )
v✓(s0
)s0
Update	the	staIsIcs	of	every	
visited	edge												:																(s, a)
Nr(s, a) Nr(s, a) + 1
Wr(s, a) Wr(s, a) + r(sT )
Nv(s, a) Nv(s, a) + 1
Wv(s, a) Wv(s, a) + v✓(s0
)
Q(s, a) = (1 )
Wv(s, a)
Nv(s, a)
+
Wr(s, a)
Nr(s, a)
InterpolaIon	constant	
77
Distributed	Search		
p⇡
r(sT )
v✓(s0
)
p (a0
|s0
)
Main	search	tree	
Master	CPU	
Policy	&	value	networks	
176	GPUs	
Rollout	policy	networks	
1,202	CPUs		
78
Reference	
•  Mastering	the	game	of	Go	with	deep	neural	
networks	and	tree	search	
– hqp://www.nature.com/nature/journal/v529/n7587/
full/nature16961.html	
79
Further	Reading	
•  Monte	Carlo	Tree	Search	
– hqps://jesradberry.com/posts/2015/09/intro-to-
monte-carlo-tree-search/	
•  Neural	Networks	Backward	PropagaIon	
– hqp://cpmarkchang.logdown.com/posts/277349-
neural-network-backward-propagaIon	
•  ConvoluIonal	Neural	Networks	
– hqp://cs231n.github.io/convoluIonal-networks/	
•  Policy	Gradient	Method:	REINFORCE	
– hqps://www.cs.cmu.edu/afs/cs/project/jair/pub/
volume4/kaelbling96a-html/node37.html		
80
About	the	Speaker	
•  Email:	ckmarkoh	at	gmail	dot	com	
•  Blog:	hqp://cpmarkchang.logdown.com	
•  Github:	hqps://github.com/ckmarkoh	
F.C.C	
Mark	Chang	
•  Facebook:	hqps://www.facebook.com/ckmarkoh.chang	
•  Slideshare:	hqp://www.slideshare.net/ckmarkohchang	
•  Linkedin:	hqps://www.linkedin.com/pub/mark-chang/85/25b/847	
81

More Related Content

What's hot

AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction友誠 張
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리Shane (Seungwhan) Moon
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorialYisong Yue
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약Jooyoul Lee
 
『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用
『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用
『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用RyoAdachi
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesThomas da Silva Paula
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기NAVER D2
 
深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)Junichiro Katsuta
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsSangwoo Mo
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learningDing Li
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門Ryo Iwaki
 
カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習
カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習
カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習Hori Tasuku
 

What's hot (20)

AlphaGo Zero Introduction
AlphaGo Zero IntroductionAlphaGo Zero Introduction
AlphaGo Zero Introduction
 
알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리알파고 (바둑 인공지능)의 작동 원리
알파고 (바둑 인공지능)의 작동 원리
 
Imitation learning tutorial
Imitation learning tutorialImitation learning tutorial
Imitation learning tutorial
 
Deep Reinforcement Learning
Deep Reinforcement LearningDeep Reinforcement Learning
Deep Reinforcement Learning
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
AlphaGo
AlphaGoAlphaGo
AlphaGo
 
AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약AlphaGo 알고리즘 요약
AlphaGo 알고리즘 요약
 
『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用
『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用
『逆転オセロニア 』における、機械学習モデルを用いたデッキのアーキタイプ抽出とゲーム運用への活用
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Alpha zero - London 2018
Alpha zero  - London 2018 Alpha zero  - London 2018
Alpha zero - London 2018
 
A brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to gamesA brief overview of Reinforcement Learning applied to games
A brief overview of Reinforcement Learning applied to games
 
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
딥러닝과 강화 학습으로 나보다 잘하는 쿠키런 AI 구현하기
 
深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)深層強化学習でマルチエージェント学習(前篇)
深層強化学習でマルチエージェント学習(前篇)
 
Multi-Armed Bandit and Applications
Multi-Armed Bandit and ApplicationsMulti-Armed Bandit and Applications
Multi-Armed Bandit and Applications
 
Deep Q-Learning
Deep Q-LearningDeep Q-Learning
Deep Q-Learning
 
Reinforcement learning
Reinforcement learningReinforcement learning
Reinforcement learning
 
ゆるふわ強化学習入門
ゆるふわ強化学習入門ゆるふわ強化学習入門
ゆるふわ強化学習入門
 
カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習
カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習
カスタムSIで使ってみよう ~ OpenAI Gym を使った強化学習
 

Similar to AlphaGo in Depth

Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learningsimaokasonse
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplifiedLovelyn Rose
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesOlivier Teytaud
 
Transformers ASR.pdf
Transformers ASR.pdfTransformers ASR.pdf
Transformers ASR.pdfssuser8025b21
 
COA-unit-2-Arithmetic.ppt
COA-unit-2-Arithmetic.pptCOA-unit-2-Arithmetic.ppt
COA-unit-2-Arithmetic.pptRuhul Amin
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsAhmed Gad
 
Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation Nabil Chouba
 
Logic Blox vintage user guide
Logic Blox vintage user guideLogic Blox vintage user guide
Logic Blox vintage user guideMichael Hogan
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchOlivier Teytaud
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)Amro Elfeki
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...
Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...
Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...Matthieu Macret
 
Deep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalDeep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalPlain Concepts
 
Midterm revision 2022 without answer.pdf
Midterm revision 2022  without answer.pdfMidterm revision 2022  without answer.pdf
Midterm revision 2022 without answer.pdfAhmedSalah48055
 

Similar to AlphaGo in Depth (19)

Learning Deep Learning
Learning Deep LearningLearning Deep Learning
Learning Deep Learning
 
Deep learning simplified
Deep learning simplifiedDeep learning simplified
Deep learning simplified
 
eel6935_ch2.pdf
eel6935_ch2.pdfeel6935_ch2.pdf
eel6935_ch2.pdf
 
AlphaZero and beyond: Polygames
AlphaZero and beyond: PolygamesAlphaZero and beyond: Polygames
AlphaZero and beyond: Polygames
 
Transformers ASR.pdf
Transformers ASR.pdfTransformers ASR.pdf
Transformers ASR.pdf
 
COA-unit-2-Arithmetic.ppt
COA-unit-2-Arithmetic.pptCOA-unit-2-Arithmetic.ppt
COA-unit-2-Arithmetic.ppt
 
Brief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNsBrief Introduction to Deep Learning + Solving XOR using ANNs
Brief Introduction to Deep Learning + Solving XOR using ANNs
 
Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation Multilayer Neuronal network hardware implementation
Multilayer Neuronal network hardware implementation
 
02-gates-w.pptx
02-gates-w.pptx02-gates-w.pptx
02-gates-w.pptx
 
Logic Blox vintage user guide
Logic Blox vintage user guideLogic Blox vintage user guide
Logic Blox vintage user guide
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Machine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree SearchMachine learning 2016: deep networks and Monte Carlo Tree Search
Machine learning 2016: deep networks and Monte Carlo Tree Search
 
Geohydrology ii (3)
Geohydrology ii (3)Geohydrology ii (3)
Geohydrology ii (3)
 
Digital logic
Digital logicDigital logic
Digital logic
 
Programming at King's
Programming at King'sProgramming at King's
Programming at King's
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...
Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...
Automatic Tuning of the OP-1 Synthesizer Using a Multi-objective Genetic Algo...
 
Deep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalDeep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo Doval
 
Midterm revision 2022 without answer.pdf
Midterm revision 2022  without answer.pdfMidterm revision 2022  without answer.pdf
Midterm revision 2022 without answer.pdf
 

More from Mark Chang

Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationMark Chang
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationMark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningMark Chang
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningMark Chang
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain AdaptationMark Chang
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOWMark Chang
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsMark Chang
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMark Chang
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksMark Chang
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly ProblemMark Chang
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterMark Chang
 
淺談深度學習
淺談深度學習淺談深度學習
淺談深度學習Mark Chang
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 
TensorFlow 深度學習快速上手班--深度學習
 TensorFlow 深度學習快速上手班--深度學習 TensorFlow 深度學習快速上手班--深度學習
TensorFlow 深度學習快速上手班--深度學習Mark Chang
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用Mark Chang
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用Mark Chang
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習Mark Chang
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10Mark Chang
 

More from Mark Chang (20)

Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Modeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential EquationModeling the Dynamics of SGD by Stochastic Differential Equation
Modeling the Dynamics of SGD by Stochastic Differential Equation
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
NTU ML TENSORFLOW
NTU ML TENSORFLOWNTU ML TENSORFLOW
NTU ML TENSORFLOW
 
NTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANsNTHU AI Reading Group: Improved Training of Wasserstein GANs
NTHU AI Reading Group: Improved Training of Wasserstein GANs
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Applied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural NetworksApplied Deep Learning 11/03 Convolutional Neural Networks
Applied Deep Learning 11/03 Convolutional Neural Networks
 
The Genome Assembly Problem
The Genome Assembly ProblemThe Genome Assembly Problem
The Genome Assembly Problem
 
DRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive WriterDRAW: Deep Recurrent Attentive Writer
DRAW: Deep Recurrent Attentive Writer
 
淺談深度學習
淺談深度學習淺談深度學習
淺談深度學習
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 
TensorFlow 深度學習快速上手班--深度學習
 TensorFlow 深度學習快速上手班--深度學習 TensorFlow 深度學習快速上手班--深度學習
TensorFlow 深度學習快速上手班--深度學習
 
TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用TensorFlow 深度學習快速上手班--電腦視覺應用
TensorFlow 深度學習快速上手班--電腦視覺應用
 
TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用TensorFlow 深度學習快速上手班--自然語言處理應用
TensorFlow 深度學習快速上手班--自然語言處理應用
 
TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習TensorFlow 深度學習快速上手班--機器學習
TensorFlow 深度學習快速上手班--機器學習
 
Computational Linguistics week 10
 Computational Linguistics week 10 Computational Linguistics week 10
Computational Linguistics week 10
 

Recently uploaded

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 

Recently uploaded (20)

Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 

AlphaGo in Depth

Editor's Notes

  1. \underset{i}{\mathrm{argmax }} ( \bar{x_{i}}+\sqrt{\frac{2 \text{log} n} {n_{i}} } )
  2. a^{*} = \underset{a}{\mathrm{argmax }} ( Q(s,a) +c \sqrt{ \frac{ \text{log} N(s)} {N(s,a)} } ) R(s,a) = Q(s,a) +c \sqrt{ \frac{ \text{log} N(s)} {N(s,a)} } a^{*} = \underset{a}{\mathrm{argmax }} R(s,a)
  3. R(s,a_{1}) = 2/3 + 0.5 \sqrt{\frac{\text{log}5}{3}} = 1.3029 R(s,a_{2}) = 1/2 + 0.5 \sqrt{\frac{\text{log}5}{2}} =0.9485 a^{*} = \underset{a}{\mathrm{argmax }} R(s,a) = a_{1}
  4. \frac{1-e^{-2n_{in}}}{1+e^{-2n_{in}}}
  5. n_{1,out} = \frac{e^{n_{1,in}}}{e^{n_{1,in}}+e^{n_{2,in}}+e{^{n_{3,in}} }} n_{2,out} = \frac{e^{n_{2,in}}}{e^{n_{1,in}}+e^{n_{2,in}}+e{^{n_{3,in}} }} n_{3,out} = \frac{e^{n_{3,in}}}{e^{n_{1,in}}+e^{n_{2,in}}+e{^{n_{3,in}} }} n_{1,in} &n_{1,in} > n_{2,in } \text{ }and \\ &n_{1,in} > n_{3, in} &n_{1,in} < n_{2,in } \text{ }or \\ &n_{1,in} < n_{3, in}
  6. n_{1,out} = \frac{e^{n_{1,in}}}{e^{n_{1,in}}+e^{n_{2,in}}+e{^{n_{3,in}} }} n_{2,out} = \frac{e^{n_{2,in}}}{e^{n_{1,in}}+e^{n_{2,in}}+e{^{n_{3,in}} }} n_{3,out} = \frac{e^{n_{3,in}}}{e^{n_{1,in}}+e^{n_{2,in}}+e{^{n_{3,in}} }} n_{1,in} &n_{1,in} > n_{2,in } \text{ }and \\ &n_{1,in} > n_{3, in} &n_{1,in} < n_{2,in } \text{ }or \\ &n_{1,in} < n_{3, in}
  7. \textbf{s} p_{w}(\textbf{a}|\textbf{s}) \text{log} ( p_{w}(a|\textbf{s}) ) w = w - \eta \dfrac{ \partial\text{log} ( p_{w}(a|\textbf{s}) )}{\partial w} s p_{w}(\textbf{a}|\textbf{s}) \text{log} ( p_{w}(a_{i} |\textbf{s}) ) w = w - \eta \dfrac{ \partial\text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w} w = w - \eta \dfrac{ \partial\text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w}
  8. \textbf{s} p_{w}(\textbf{a}|\textbf{s}) \text{log} ( p_{w}(a|\textbf{s}) ) w = w - \eta \dfrac{ \partial\text{log} ( p_{w}(a|\textbf{s}) )}{\partial w} s p_{w}(\textbf{a}|\textbf{s}) \text{log} ( p_{w}(a_{i} |\textbf{s}) ) w = w - \eta \dfrac{ \partial\text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w} w = w - \eta \dfrac{ \partial\text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w}
  9. - \text{log} ( p_{w}(a_{i} |\textbf{s}) ) -\dfrac{ \partial - \text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w} w = w - \eta \dfrac{ \partial -\text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w} p_{w}(a_{i}|\textbf{s}) \approx 1 p_{w}(a_{i}|\textbf{s}) \approx 0
  10. - \text{log} ( p_{w}(a_{i} |\textbf{s}) ) -\dfrac{ \partial - \text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w} w = w - \eta \dfrac{ \partial -\text{log} ( p_{w}(a_{i} |\textbf{s}) )}{\partial w}
  11. p_{w}(a_{i} |\textbf{s}) a_{i}=1 a_{i}=0 \textbf{s}
  12. \dfrac{\partial J}{\partial n_{21(out)}} w_{21,11} \leftarrow w_{21,11} - \eta \dfrac{\partial J}{\partial n_{21(out)}} \dfrac{\partial n_{21(out)}}{\partial n_{21(in)}} \dfrac{\partial n_{21(in)}}{\partial w_{21,11}} \dfrac{\partial J}{\partial n_{2(out)}} = \dfrac{\partial J}{\partial n_{2(out)}} \dfrac{\partial n_{2(out)}}{\partial n_{2(in)}} \dfrac{\partial n_{2(in)}}{\partial w_{21}} w_{21} \leftarrow w_{21} - \eta \dfrac{\partial J}{\partial w_{21}} w_{21} \leftarrow w_{21} - \eta \dfrac{\partial J}{\partial n_{2(out)}} \dfrac{\partial n_{2(out)}}{\partial n_{2(in)}} \dfrac{\partial n_{2(in)}}{\partial w_{21}} & \dfrac{\partial J}{\partial n_{2(out)}} = \dfrac{\partial J}{\partial n_{2(out)}} \dfrac{\partial n_{2(out)}}{\partial n_{2(in)}} \dfrac{\partial n_{2(in)}}{\partial w_{21}} \\ & w_{21} \leftarrow w_{21} - \eta \dfrac{\partial J}{\partial w_{21}} \\ & w_{21} \leftarrow w_{21} - \eta \dfrac{\partial J}{\partial n_{2(out)}} \dfrac{\partial n_{2(out)}}{\partial n_{2(in)}} \dfrac{\partial n_{2(in)}}{\partial w_{21}} \\ n_{2(out)}
  13. V(s_{2}) \pi(a_{1} \mid s) > \pi(a_{2} \mid s) & \pi(a_{1} \mid s) \\ & = 0.5 & \pi(a_{2} \mid s) \\ & = 0.5
  14. s_{1} q_{\pi}(s,a)
  15. w \leftarrow w + (r - b )\dfrac{\partial \pi(a|s)}{\partial w}
  16. \pi(a|s)
  17. p_{\sigma} p_{\pi} p_{\rho} v_{\theta}
  18. p_{\sigma}(a|s) \sigma \leftarrow \sigma - \eta \dfrac{\partial - \text{log} p_{\sigma}(a|s)}{\partial \sigma}
  19. \Delta \rho \propto \dfrac{\partial p_{\rho}(a|s) }{\partial \rho} r \Delta \rho \propto \dfrac{\partial p_{\rho}(a|s) }{\partial \rho} r
  20. p_{\rho}(a_{1}|s_{1}) p_{\rho}(a_{2}|s_{2}) p_{\rho}(a_{T}|s_{T}) r(s_{T}) \rho \rightarrow \rho + \alpha \sum_{i=1}^{T} \dfrac{\partial p_{\rho}(a_{i}|s_{i}) }{\partial \rho} ( r(s_{T}) - b(s_{t}) ) \rho \rightarrow \rho + \alpha \sum_{i=1}^{T} \dfrac{\partial p_{\rho}(a_{i}|s_{i}) }{\partial \rho} r(s_{T})
  21. \theta \leftarrow \theta + \eta (z-v_{\theta}(s)) \dfrac{\partial v_{\theta}(s) }{\partial \theta} \theta \leftarrow \theta + \eta (z-v_{\theta}(s)) \dfrac{\partial v_{\theta}(s) }{\partial \theta}
  22. P(s,a) N_{v}(s,a) N_{r}(s,a) W_{r}(s,a) W_{v}(s,a) v_{\theta}(s) p_{\sigma}(a|s) p_{\pi}(a|s)
  23. P(s,a) N_{v}(s,a) N_{r}(s,a) W_{r}(s,a) W_{v}(s,a) a^{*} a^{*} = \underset{a}{\mathrm{argmax }}\text{ } Q(s,a) + cP(s,a)\frac{\sqrt{\sum_{b}{N_{r}(s,b)}}}{1+N_{r}(s,a)} a^{*} = \underset{a}{\mathrm{argmax }}( Q(s,a) + u(s,a) ) u(s,a) = cP(s,a)\frac{\sqrt{\sum_{b}{N_{r}(s,b)}}}{1+N_{r}(s,a)}
  24. N_{v}(s',a') = N_{r}(s',a') = 0 W_{r}(s’,a’) = W_{v}(s’,a’ )= 0 P(s’,a’) = p_{\sigma}(a’|s’) N_{r}(s',a') > n_{thr} p_{\sigma}(a'|s') & N_{v}(s',a') = N_{r}(s',a') = 0 \\ & W_{r}(s’,a’) = W_{v}(s’,a’ )= 0 \\ & P(s’,a’) = p_{\sigma}(a’|s’) \\
  25. v_{\theta} r({s_{T}}) z = r({s_{T}})
  26. & N_{r}(s,a) \leftarrow N_{r}(s,a) +1 \\ & W_{r}(s,a) \leftarrow W_{r}(s,a) + r(s_{T}) \\ & N_{v}(s,a) \leftarrow N_{v}(s,a) +1 \\ & W_{v}(s,a) \leftarrow W_{v}(s,a) + v_{\theta}(s') \\ & Q(s,a) = (1-\lambda) \frac{W_{v}(s,a)}{N_{v}(s,a)}+\lambda \frac{W_{r}(s,a)}{N_{r}(s,a)} \\