Q-Learning Explained - A Reinforcement Learning Technique

Q-Learning Explained - A Reinforcement Learning Technique

deeplizard

6 лет назад

239,730 Просмотров

Ссылки и html тэги не поддерживаются


Комментарии:

@arindammukherjee391
@arindammukherjee391 - 21.03.2025 00:51

brilliant

Ответить
@poiuwnwang7109
@poiuwnwang7109 - 15.03.2025 07:34

so nice

Ответить
@tristanbrown6954
@tristanbrown6954 - 04.03.2025 06:05

You’re the GOAT

Ответить
@davidak_de
@davidak_de - 19.07.2024 21:58

Q-Star Lizard Gang 2024

Ответить
@Ayushsingh-zw3yk
@Ayushsingh-zw3yk - 11.04.2024 06:24

nice explanation deeplizard

Ответить
@saumyachaturvedi9065
@saumyachaturvedi9065 - 05.02.2024 13:47

I guess crickets make sound, so lizard can take that as input as well to take the path

Ответить
@obensustam3574
@obensustam3574 - 02.02.2024 13:32

Very good content, I watched the videos in this playlist to prepera for my exam. Thank you 😊

Ответить
@davidli9872
@davidli9872 - 23.11.2023 04:50

Are you here after Reuter's article on OpenAI's q*?

Ответить
@DreadFox_official
@DreadFox_official - 24.09.2023 22:12

Hey, I loved your video. Thank you so much

Ответить
@guineteherve9751
@guineteherve9751 - 26.05.2023 00:41

Your work is simply incredible. Thank you!

Ответить
@mehershrishtinigam5449
@mehershrishtinigam5449 - 08.02.2023 20:23

she has the most annoying voice ever jesus christ

Ответить
@TheGroundskeeper
@TheGroundskeeper - 24.12.2022 17:26

more B roll video crap without code or mathematics, pandering advanced topics to idiots. this is a waste of human hours.

Ответить
@tallwaters9708
@tallwaters9708 - 10.03.2022 16:25

I'll tell you what I really don't get, it seems the equation only updates the q table based on the current and next state. But the Bellman equation seems to imply that all future states are considered, is there some recursion thing going on?

Ответить
@yelircaasi
@yelircaasi - 17.01.2022 21:13

Really nice video, thanks for the clear explanations!

Ответить
@rosameliacarioni1022
@rosameliacarioni1022 - 11.12.2021 23:23

Thanks so muuuuch !

Ответить
@EarlWallaceNYC
@EarlWallaceNYC - 24.10.2021 04:52

O' the puns, ... exploit vs explore

Ответить
@NoNTr1v1aL
@NoNTr1v1aL - 05.10.2021 14:00

Amazing video!

Ответить
@mateusbalotin7247
@mateusbalotin7247 - 22.09.2021 23:06

Thank you!

Ответить
@rursus8354
@rursus8354 - 15.08.2021 20:02

Won't a square become empty when the cricket(s) is(are) eaten?

Ответить
@yashas9974
@yashas9974 - 15.07.2021 20:56

Link to the talk that appeared at the end of the video?

Ответить
@absimaldata
@absimaldata - 10.06.2021 20:48

Why you are so so clear in explaining, I mean why others fail to deliver the tutorials with such clarity like you do?? I dont know whats wrong with everyone. Omg you are impressive.

Ответить
@pututp
@pututp - 10.06.2021 17:43

I am too stupid to understand the video.. My bad..

Ответить
@patite3103
@patite3103 - 16.05.2021 12:23

your videos are awsome! Please correct the corresponding quiz since the answer is uncorrect to me. Could you do a video explaining the first three steps and how the q-tables updates. This would really help to understand how the update works. thank you!

Ответить
@mohammadmohi8561
@mohammadmohi8561 - 10.05.2021 01:38

u r an AI, so nicely explained all these hard concepts so easily. thank u so much

Ответить
@cedrichung6820
@cedrichung6820 - 13.04.2021 18:10

How are you so good at explaining😍😍😍😍

Ответить
@krajkumar6
@krajkumar6 - 13.03.2021 09:32

Hey @deeplizard,
Many thanks for this video. I'm reading 'Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow' and I'd like to know whether Q- learning technique described here is the same as dynamic programming explained in the book?

Ответить
@sontapaa11jokulainen94
@sontapaa11jokulainen94 - 22.11.2020 14:52

Is the exploration vs exploitation part only a part of the training or does it happen also when actually using the learned q table and also can the policy be that "Take the action which has the largest q value and sometimes explore" (eq can that be an example of a policy in this case)? So the policy is just the probability of taking some action in a state so can the policy just be written as: "Take the action which has the largest q value" (this is just an example for exploitation)?

Ответить
@arefeshghi
@arefeshghi - 23.10.2020 01:59

Good balance of exploration and exploitation will bring good results in life too! We are all lizards! :)

Ответить
@davidkhassias4876
@davidkhassias4876 - 19.10.2020 14:52

Can't wait for coming episodes, because this series is amazing! And they/you helped me a lot. Thank you so much! <3

Ответить
@adamhendry945
@adamhendry945 - 12.09.2020 23:21

PHENOMENAL! Your videos are THE BEST! Can you PLEASE PLEASE PLEASE do a series on Actor-Critic methods!!

Ответить
@asdfasdfuhf
@asdfasdfuhf - 24.08.2020 21:29

This was an exciting video, finally, we are getting to the good stuff.

Ответить
@shashankdhananjaya9923
@shashankdhananjaya9923 - 15.07.2020 19:11

Awesome explanation. I like this

Ответить
@muhammadsohailnisar6600
@muhammadsohailnisar6600 - 02.07.2020 12:11

please remove the sound played with the logo at the start of video. the sound is very bad especially when one listens it on head phones.

Ответить
@madhesh18
@madhesh18 - 09.05.2020 20:10

Really good work

Ответить
@xiaojiang2610
@xiaojiang2610 - 04.05.2020 03:50

Better than my engineering teacher.

Ответить
@Asmutiwari
@Asmutiwari - 02.05.2020 15:29

These series are so so informative !! I wish you could make videos on dynamic navigation techniques using DRL

Ответить
@louerleseigneur4532
@louerleseigneur4532 - 29.04.2020 02:05

merci merci
hats off

Ответить
@shoaibalyaan
@shoaibalyaan - 24.03.2020 11:39

AMAZING SERIES! Absolutely loved it!

Ответить
@SugamMaheshwari
@SugamMaheshwari - 05.03.2020 07:04

Your voice is just amazing 😍😍😍😍😍

Ответить
@adwaitnaik4003
@adwaitnaik4003 - 14.02.2020 09:59

channel name is creepy but explanation is amazing...

Ответить
@namitaa
@namitaa - 14.02.2020 07:23

you saved my life bro

Ответить
@michaelscott8572
@michaelscott8572 - 12.02.2020 18:53

Thanks for the good explanation and all your work. A little hint if I may: Don't explain the words using the same words: Exploitation and Exploration

Ответить
@TheOfficialJeppezon
@TheOfficialJeppezon - 09.02.2020 12:30

You say that Q-learning tires to find the best policy. However, I thought q-learning is an off-policy algorithm. I also have troubles understanding the on/off policy concept.

Ответить
@MohdDanish-bh1ok
@MohdDanish-bh1ok - 16.12.2019 23:22

Luv u babe.

Ответить
@neogarciagarcia443
@neogarciagarcia443 - 16.12.2019 21:01

exploration of reinforcement learning is going fine !

Ответить
@iAndrewMontanai
@iAndrewMontanai - 04.11.2019 04:00

What should I do in case of continuous tasks? Like in Flappy Bird (if its continuos, but anyway), I guess Q Table would be infinite here or just would have big fixed value to save memory. Can you give some recommendations or explain please? Wanna start implementation, but dont know how Q Table should look like in this case and how to interact with it correctly ( and i hope there will be no other surprises lol )

Ответить