Q-Learning Explained - A Reinforcement Learning Technique

deeplizard

6 лет назад

239,730 Просмотров

Скачать видео

Комментарии:

@arindammukherjee391 - 21.03.2025 00:51

brilliant

Ответить

@poiuwnwang7109 - 15.03.2025 07:34

so nice

Ответить

@tristanbrown6954 - 04.03.2025 06:05

You’re the GOAT

Ответить

@davidak_de - 19.07.2024 21:58

Q-Star Lizard Gang 2024

Ответить

@Ayushsingh-zw3yk - 11.04.2024 06:24

nice explanation deeplizard

Ответить

@saumyachaturvedi9065 - 05.02.2024 13:47

I guess crickets make sound, so lizard can take that as input as well to take the path

Ответить

@obensustam3574 - 02.02.2024 13:32

Very good content, I watched the videos in this playlist to prepera for my exam. Thank you 😊

Ответить

@davidli9872 - 23.11.2023 04:50

Are you here after Reuter's article on OpenAI's q*?

Ответить

@DreadFox_official - 24.09.2023 22:12

Hey, I loved your video. Thank you so much

Ответить

@guineteherve9751 - 26.05.2023 00:41

Your work is simply incredible. Thank you!

Ответить

@mehershrishtinigam5449 - 08.02.2023 20:23

she has the most annoying voice ever jesus christ

Ответить

@TheGroundskeeper - 24.12.2022 17:26

more B roll video crap without code or mathematics, pandering advanced topics to idiots. this is a waste of human hours.

Ответить

@tallwaters9708 - 10.03.2022 16:25

I'll tell you what I really don't get, it seems the equation only updates the q table based on the current and next state. But the Bellman equation seems to imply that all future states are considered, is there some recursion thing going on?

Ответить

@yelircaasi - 17.01.2022 21:13

Really nice video, thanks for the clear explanations!

Ответить

@rosameliacarioni1022 - 11.12.2021 23:23

Thanks so muuuuch !

Ответить

@EarlWallaceNYC - 24.10.2021 04:52

O' the puns, ... exploit vs explore

Ответить

@NoNTr1v1aL - 05.10.2021 14:00

Amazing video!

Ответить

@mateusbalotin7247 - 22.09.2021 23:06

Thank you!

Ответить

@rursus8354 - 15.08.2021 20:02

Won't a square become empty when the cricket(s) is(are) eaten?

Ответить

@yashas9974 - 15.07.2021 20:56

Link to the talk that appeared at the end of the video?

Ответить

@absimaldata - 10.06.2021 20:48

Why you are so so clear in explaining, I mean why others fail to deliver the tutorials with such clarity like you do?? I dont know whats wrong with everyone. Omg you are impressive.

Ответить

@pututp - 10.06.2021 17:43

I am too stupid to understand the video.. My bad..

Ответить

@patite3103 - 16.05.2021 12:23

your videos are awsome! Please correct the corresponding quiz since the answer is uncorrect to me. Could you do a video explaining the first three steps and how the q-tables updates. This would really help to understand how the update works. thank you!

Ответить

@mohammadmohi8561 - 10.05.2021 01:38

u r an AI, so nicely explained all these hard concepts so easily. thank u so much

Ответить

@cedrichung6820 - 13.04.2021 18:10

How are you so good at explaining😍😍😍😍

Ответить

@krajkumar6 - 13.03.2021 09:32

Hey @deeplizard,
Many thanks for this video. I'm reading 'Reinforcement Learning: An Introduction, Second Edition by Richard S. Sutton and Andrew G. Bartow' and I'd like to know whether Q- learning technique described here is the same as dynamic programming explained in the book?

Ответить

@sontapaa11jokulainen94 - 22.11.2020 14:52

Is the exploration vs exploitation part only a part of the training or does it happen also when actually using the learned q table and also can the policy be that "Take the action which has the largest q value and sometimes explore" (eq can that be an example of a policy in this case)? So the policy is just the probability of taking some action in a state so can the policy just be written as: "Take the action which has the largest q value" (this is just an example for exploitation)?

Ответить

@arefeshghi - 23.10.2020 01:59

Good balance of exploration and exploitation will bring good results in life too! We are all lizards! :)

Ответить

@davidkhassias4876 - 19.10.2020 14:52

Can't wait for coming episodes, because this series is amazing! And they/you helped me a lot. Thank you so much! <3

Ответить

@adamhendry945 - 12.09.2020 23:21

PHENOMENAL! Your videos are THE BEST! Can you PLEASE PLEASE PLEASE do a series on Actor-Critic methods!!

Ответить

@asdfasdfuhf - 24.08.2020 21:29

This was an exciting video, finally, we are getting to the good stuff.

Ответить

@shashankdhananjaya9923 - 15.07.2020 19:11

Awesome explanation. I like this

Ответить

@muhammadsohailnisar6600 - 02.07.2020 12:11

please remove the sound played with the logo at the start of video. the sound is very bad especially when one listens it on head phones.

Ответить

@madhesh18 - 09.05.2020 20:10

Really good work

Ответить

@xiaojiang2610 - 04.05.2020 03:50

Better than my engineering teacher.

Ответить

@Asmutiwari - 02.05.2020 15:29

These series are so so informative !! I wish you could make videos on dynamic navigation techniques using DRL

Ответить

@louerleseigneur4532 - 29.04.2020 02:05

merci merci
hats off

Ответить

@shoaibalyaan - 24.03.2020 11:39

AMAZING SERIES! Absolutely loved it!

Ответить

@SugamMaheshwari - 05.03.2020 07:04

Your voice is just amazing 😍😍😍😍😍

Ответить

@adwaitnaik4003 - 14.02.2020 09:59

channel name is creepy but explanation is amazing...

Ответить

@namitaa - 14.02.2020 07:23

you saved my life bro

Ответить

@michaelscott8572 - 12.02.2020 18:53

Thanks for the good explanation and all your work. A little hint if I may: Don't explain the words using the same words: Exploitation and Exploration

Ответить

@TheOfficialJeppezon - 09.02.2020 12:30

You say that Q-learning tires to find the best policy. However, I thought q-learning is an off-policy algorithm. I also have troubles understanding the on/off policy concept.

Ответить

@MohdDanish-bh1ok - 16.12.2019 23:22

Luv u babe.

Ответить

@neogarciagarcia443 - 16.12.2019 21:01

exploration of reinforcement learning is going fine !

Ответить

@iAndrewMontanai - 04.11.2019 04:00

What should I do in case of continuous tasks? Like in Flappy Bird (if its continuos, but anyway), I guess Q Table would be infinite here or just would have big fixed value to save memory. Can you give some recommendations or explain please? Wanna start implementation, but dont know how Q Table should look like in this case and how to interact with it correctly ( and i hope there will be no other surprises lol )

Ответить

Сейчас смотрят