Hello. Sign in to personalize your visit. New user? Register now.


 
 

Algorithms for Reinforcement Learning

Synthesis Lectures on Artificial Intelligence and Machine Learning

Csaba Szepesvári​‌
University of Alberta

Abstract

Reinforcement learning is a learning paradigm concerned with learning to control a system so as to maximize a numerical performance measure that expresses a long-term objective. What distinguishes reinforcement learning from supervised learning is that only partial feedback is given to the learner about the learner's predictions. Further, the predictions may have long term effects through influencing the future state of the controlled system. Thus, time plays a special role. The goal in reinforcement learning is to develop efficient learning algorithms, as well as to understand the algorithms' merits and limitations. Reinforcement learning is of great interest because of the large number of practical applications that it can be used to address, ranging from problems in artificial intelligence to operations research or control engineering. In this book, we focus on those algorithms of reinforcement learning that build on the powerful theory of dynamic programming. We give a fairly comprehensive catalog of learning problems, describe the core ideas, note a large number of state of the art algorithms, followed by the discussion of their theoretical properties and limitations.

Table of Contents: Markov Decision Processes / Value Prediction Problems / Control / For Further Exploration

PDF (3480 KB) PDF Plus (3481 KB)

Cited by

Roberto Amadini​‌. (2023) A Survey on String Constraint Solving. ACM Computing Surveys 55:1, 1-38.
Online publication date: 31-Jan-2023.
Crossref
Akhil Hannegudda Ganesh, Bin Xu​‌. (2022) A review of reinforcement learning based energy management systems for electrified powertrains: Progress, challenge, and potential solution. Renewable and Sustainable Energy Reviews 154, 111833.
Online publication date: 1-Feb-2022.
Crossref
Mifeng Ren, Xiangfei Liu, Zhile Yang, Jianhua Zhang, Yuanjun Guo, Yanbing Jia​‌. (2022) A novel forecasting based scheduling method for household energy management system based on deep reinforcement learning. Sustainable Cities and Society 76, 103207.
Online publication date: 1-Jan-2022.
Crossref
Hye-Yeon Ryu, Je-Seong Kwon, Jeong-Hak Lim, A-Hyeon Kim, Su-Jin Baek, Jong-Wook Kim​‌. (2022) Development of an Autonomous Driving Smart Wheelchair for the Physically Weak. Applied Sciences 12:1, 377.
Online publication date: 31-Dec-2021.
Crossref
Chaochao Gao, Xin Min, Minghao Fang, Tianyi Tao, Xiaohong Zheng, Yangai Liu, Xiaowen Wu, Zhaohui Huang​‌. (2022) Innovative Materials Science via Machine Learning. Advanced Functional Materials 32:1, 2108044.
Online publication date: 4-Oct-2021.
Crossref
Kashish Gupta, Debasmita Mukherjee, Homayoun Najjaran​‌. (2022) Extending the Capabilities of Reinforcement Learning Through Curriculum: A Review of Methods and Applications. SN Computer Science 3:1.
Online publication date: 29-Oct-2021.
Crossref
A. S. Gowri, P. ShanthiBala, Immanuel Zion Ramdinthara​‌. 2022. Fog-Cloud Enabled Internet of Things Using Extended Classifier System (XCS). Artificial Intelligence-based Internet of Things Systems, 163-189.
Crossref
, . (2021) Machine and Deep Learning Algorithms and Applications. Synthesis Lectures on Signal Processing 12:3, 1-123.
Online publication date: 22-Dec-2021.
Abstract | PDF (2213 KB) | PDF Plus (2380 KB) 
Ms. R.S. Sandhya Devi, V.R. Vijay Kumar, P. Sivakumar​‌. (2021) A Review of image Classification and Object Detection on Machine learning and Deep Learning Techniques. 2021 5th International Conference on Electronics, Communication and Aerospace Technology (ICECA), 1-8.
Crossref
Francesco Gismondi, Corrado Possieri, Antonio Tornambe​‌. (2021) A solution to the path planning problem via algebraic geometry and reinforcement learning. Journal of the Franklin Institute 1.
Online publication date: 1-Dec-2021.
Crossref
Yoriyuki Yamagata, Shuang Liu, Takumi Akazaki, Yihai Duan, Jianye Hao​‌. (2021) Falsification of Cyber-Physical Systems Using Deep Reinforcement Learning. IEEE Transactions on Software Engineering 47:12, 2823-2840.
Online publication date: 1-Dec-2021.
Crossref
Anthony Miller, John Panneerselvam, Lu Liu​‌. (2021) A review of regression and classification techniques for analysis of common and rare variants and gene-environmental factors. Neurocomputing 280.
Online publication date: 1-Dec-2021.
Crossref
Aya Hussein, Sondoss Elsawah, Eleni Petraki, Hussein A. Abbass​‌. (2021) A machine education approach to swarm decision-making in best-of-n problems. Swarm Intelligence 2.
Online publication date: 22-Nov-2021.
Crossref
Elton Pan, Panagiotis Petsagkourakis, Max Mowbray, Dongda Zhang, Ehecatl Antonio del Rio-Chanona​‌. (2021) Constrained model-free reinforcement learning for process optimization. Computers & Chemical Engineering 154, 107462.
Online publication date: 1-Nov-2021.
Crossref
Annapoorni Mani, Shahriman Abu Bakar, Pranesh Krishnan, Sazali Yaacob​‌. (2021) Comparison of optimized Markov Decision Process using Dynamic Programming and Temporal Differencing – A reinforcement learning approach. Journal of Physics: Conference Series 2107:1, 012026.
Online publication date: 1-Nov-2021.
Crossref
Xiaoyang Gao, Siqi Chen, Yan Zheng, Jianye Hao​‌. (2021) A deep reinforcement learning-based agent for negotiation with multiple communication channels. 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI), 868-872.
Crossref
Behice Meltem Kayhan, Gokalp Yildiz​‌. (2021) Reinforcement learning applications to machine scheduling problems: a comprehensive literature review. Journal of Intelligent Manufacturing 63.
Online publication date: 19-Oct-2021.
Crossref
Rigoberto Acosta‐González, Paulo V. Klaine, Samuel Montejo‐Sánchez, Richard D. Souza, Lei Zhang, Muhammad A. Imran​‌. 2021. A Cooperative Multiagent Approach for Optimal Drone Deployment Using Reinforcement Learning. Autonomous Airborne Wireless Networks, 47-72.
Crossref
Neil Walton, Kuang Xu​‌. 2021. Learning and Information in Stochastic Networks and Queues. Tutorials in Operations Research: Emerging Optimization Methods and Modeling Techniques with Applications, 161-198.
Crossref
Sandeep Chinchali, Apoorva Sharma, James Harrison, Amine Elhafsi, Daniel Kang, Evgenya Pergament, Eyal Cidon, Sachin Katti, Marco Pavone​‌. (2021) Network offloading policies for cloud robotics: a learning-based approach. Autonomous Robots 45:7, 997-1012.
Online publication date: 3-Jul-2021.
Crossref
Dong-Jin Shin, Jeong-Joon Kim​‌. (2021) Deep Reinforcement Learning-Based Network Routing Technology for Data Recovery in Exa-Scale Cloud Distributed Clustering Systems. Applied Sciences 11:18, 8727.
Online publication date: 18-Sep-2021.
Crossref
Amin Rezaeipanah, Parvin Amiri, Shahram Jafari​‌. (2021) Performing the Kick During Walking for RoboCup 3D Soccer Simulation League Using Reinforcement Learning Algorithm. International Journal of Social Robotics 13:6, 1235-1252.
Online publication date: 1-Nov-2020.
Crossref
Wentao Liu, Xiaolong Xu, Lianyong Qi, Xuyun Zhang, Wanchun Dou​‌. (2021) GoDeep: Intelligent IoV Service Deployment and Execution with Privacy Preservation in Cloud-edge Computing. 2021 IEEE International Conference on Web Services (ICWS), 579-587.
Crossref
Stela Makri, Panayiotis Charalambous​‌. (2021) Towards a multi-agent non-player character road network: a Reinforcement Learning approach. 2021 IEEE Conference on Games (CoG), 1-5.
Crossref
Tokey Tahmid, Mohammad Abu Lobabah, Muntasir Ahsan, Raisa Zarin, Sabah Shahnoor Anis, Faisal Bin Ashraf​‌. (2021) Character Animation Using Reinforcement Learning and Imitation Learning Algorithms. 2021 Joint 10th International Conference on Informatics, Electronics & Vision (ICIEV) and 2021 5th International Conference on Imaging, Vision & Pattern Recognition (icIVPR), 1-6.
Crossref
Tianheng Song, Dazi Li, Weimin Yang, Kotaro Hirasawa​‌. (2021) Recursive Least-Squares Temporal Difference With Gradient Correction. IEEE Transactions on Cybernetics 51:8, 4251-4264.
Online publication date: 1-Aug-2021.
Crossref
Chao Yang, Bo Zou, Wenbing Huang, Fuchun Sun, Huaping Liu​‌. (2021) Graph Topography-Aware Reinforcement Learning for Intelligent Traffic Signal Control. 2021 IEEE 11th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), 459-465.
Crossref
Ke Qiu, Hang Zhang, Yikai Lv, Yunkai Wang, Chunlin Zhou, Rong Xiong​‌. (2021) Reinforcement Learning of Serpentine Locomotion for a Snake Robot. 2021 IEEE International Conference on Real-time Computing and Robotics (RCAR), 468-473.
Crossref
Maryam Pishgar, Salah Fuad Issa, Margaret Sietsema, Preethi Pratap, Houshang Darabi​‌. (2021) REDECA: A Novel Framework to Review Artificial Intelligence and Its Applications in Occupational Safety and Health. International Journal of Environmental Research and Public Health 18:13, 6705.
Online publication date: 22-Jun-2021.
Crossref
Yifan Gao, Lezhou Wu​‌. (2021) Efficiently Mastering the Game of NoGo with Deep Reinforcement Learning Supported by Domain Knowledge. Electronics 10:13, 1533.
Online publication date: 24-Jun-2021.
Crossref
Rongyao Yuan, Yang Yang, Chao Su, Shaopei Hu, Heng Zhang, Enhua Cao, Loke Foong​‌. (2021) Research on Vibration Reduction Control Based on Reinforcement Learning. Advances in Civil Engineering 2021, 1-18.
Online publication date: 1-Jul-2021.
Crossref
Ján Kačur, Marek Laciak, Milan Durdán, Patrik Flegner​‌. (2021) Model-Free Control of UCG Based on Continual Optimization of Operating Variables: An Experimental Study. Energies 14:14, 4323.
Online publication date: 18-Jul-2021.
Crossref
Maike Sonnewald, Redouane Lguensat, Daniel C Jones, Peter D Dueben, Julien Brajard, V Balaji​‌. (2021) Bridging observations, theory and numerical simulation of the ocean using machine learning. Environmental Research Letters 16:7, 073008.
Online publication date: 22-Jul-2021.
Crossref
Jinnie Shin, Okan Bulut​‌. (2021) Building an intelligent recommendation system for personalized test scheduling in computerized assessments: A reinforcement learning approach. Behavior Research Methods 40.
Online publication date: 15-Jun-2021.
Crossref
Jiahang Liu, Lei Zuo, Xin Xu, Xinglong Zhang, Junkai Ren, Qiang Fang, Xinwang Liu​‌. (2021) Efficient Batch-Mode Reinforcement Learning Using Extreme Learning Machines. IEEE Transactions on Systems, Man, and Cybernetics: Systems 51:6, 3664-3677.
Online publication date: 1-Jun-2021.
Crossref
Ahmed Abdelghany, Khaled Abdelghany, Ching-Wen Huang​‌. (2021) An integrated reinforced learning and network competition analysis for calibrating airline itinerary choice models with constrained demand. Journal of Revenue and Pricing Management 20:3, 227-247.
Online publication date: 15-Mar-2021.
Crossref
Salah Bouktif, Abderraouf Cheniki, Ali Ouni, Hesham El-Sayed​‌. (2021) Traffic Signal Control Based on Deep Reinforcement Learning with Simplified State and Reward Definitions. 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), 253-260.
Crossref
Arash Bahari Kordabad, Hossein Nejatbakhsh Esfahani, Anastasios M. Lekkas, Sebastien Gros​‌. (2021) Reinforcement Learning based on Scenario-tree MPC for ASVs. 2021 American Control Conference (ACC), 1985-1990.
Crossref
2021. Machine Learning Of Big Dependent Data. Statistical Learning for Big Dependent Data, 419-470.
Crossref
Aye Aye Maw, Maxim Tyan, Tuan Anh Nguyen, Jae-Woo Lee​‌. (2021) iADA*-RL: Anytime Graph-Based Path Planning with Deep Reinforcement Learning for an Autonomous UAV. Applied Sciences 11:9, 3948.
Online publication date: 27-Apr-2021.
Crossref
Alexandru-Iosif Toma, Hussein Ali Jaafar, Hao-Ya Hsueh, Stephen James, Daniel Lenton, Ronald Clark, Sajad Saeedi​‌. (2021) Waypoint Planning Networks. 2021 18th Conference on Robots and Vision (CRV), 87-94.
Crossref
Antoine Plissonneau, Damien Trentesaux, Wael Ben-Messaoud, Abdelghani Bekrar​‌. (2021) AI-based speed control models for the autonomous train: a literature review. 2021 Third International Conference on Transportation and Smart Technologies (TST), 9-15.
Crossref
Matthew Muresan, Guangyuan Pan, Liping Fu​‌. (2021) Multi-Intersection Control with Deep Reinforcement Learning and Ring-and-Barrier Controllers. Transportation Research Record: Journal of the Transportation Research Board 2675:4, 308-319.
Online publication date: 29-Dec-2020.
Crossref
Luntong Li, Dazi Li, Tianheng Song, Xin Xu​‌. (2021) Actor–Critic Learning Control With Regularization and Feature Selection in Policy Gradient Estimation. IEEE Transactions on Neural Networks and Learning Systems 32:3, 1217-1227.
Online publication date: 1-Mar-2021.
Crossref
Junling Li, Weisen Shi, Ning Zhang, Xuemin Shen​‌. (2021) Delay-Aware VNF Scheduling: A Reinforcement Learning Approach With Variable Action Set. IEEE Transactions on Cognitive Communications and Networking 7:1, 304-318.
Online publication date: 1-Mar-2021.
Crossref
V Akila, J Anita Christaline, A Jothi Mani, K Meenakshi​‌. (2021) Reinforcement Learning For Walking Robot. IOP Conference Series: Materials Science and Engineering 1070:1, 012075.
Online publication date: 1-Feb-2021.
Crossref
Michael R. Kosorok, Eric B. Laber, Dylan S. Small, Donglin Zeng​‌. (2021) Introduction to the Theory and Methods Special Issue on Precision Medicine and Individualized Policy Discovery. Journal of the American Statistical Association 116:533, 159-161.
Online publication date: 9-Mar-2021.
Crossref
Maximilian Hensel​‌. 2021. Exploration Methods in Sparse Reward Environments. Reinforcement Learning Algorithms: Analysis and Applications, 35-45.
Crossref
Rachna Jain, Preeti Nagrath, Sai Tiger Raina, Paras Prakash, Anuj Thareja​‌. 2021. ADS Optimization Using Reinforcement Learning. Proceedings of 3rd International Conference on Computing Informatics and Networks, 53-63.
Crossref
Yineng Wang, Meng Li, Xi Lin, Fang He​‌. (2021) Online operations strategies for automated multistory parking facilities. Transportation Research Part E: Logistics and Transportation Review 145, 102135.
Online publication date: 1-Jan-2021.
Crossref
Adithya M. Devraj, Ana Bušić, Sean Meyn​‌. 2021. Fundamental Design Principles for Reinforcement Learning Algorithms. Handbook of Reinforcement Learning and Control, 75-137.
Crossref
Shuyan Hu, Xiaojing Chen, Wei Ni, Ekram Hossain, Xin Wang​‌. (2021) Distributed Machine Learning for Wireless Communication Networks: Techniques, Architectures, and Applications. IEEE Communications Surveys & Tutorials 23:3, 1458-1493.
Online publication date: 1-Jan-2021.
Crossref
Yucheng Yang, Sergio Lucia​‌. (2021) Multi-step Greedy Reinforcement Learning Based on Model Predictive Control. IFAC-PapersOnLine 54:3, 699-705.
Online publication date: 1-Jan-2021.
Crossref
Junya Ikemoto, Toshimitsu Ushio​‌. (2021) Continuous deep Q-learning with a simulator for stabilization of uncertain discrete-time systems. Nonlinear Theory and Its Applications, IEICE 12:4, 738-757.
Online publication date: 1-Jan-2021.
Crossref
Denis Horvath, Juraj Gazda, Eugen Slapak, Taras Maksymyuk, Mischa Dohler​‌. (2021) Evolutionary Coverage Optimization for a Self-Organizing UAV-Based Wireless Communication System. IEEE Access 9, 145066-145082.
Online publication date: 1-Jan-2021.
Crossref
Chih-Hao Huang, Feras A. Batarseh, Adel Boueiz, Ajay Kulkarni, Po-Hsuan Su, Jahan Aman​‌. (2021) Measuring outcomes in healthcare economics using Artificial Intelligence: With application to resource management. Data & Policy 3.
Online publication date: 12-Nov-2021.
Crossref
Ângelo Gregório Lovatto, Thiago Pereira Bueno, Leliane Nunes de Barros​‌. 2021. Gradient Estimation in Model-Based Reinforcement Learning: A Study on Linear Quadratic Environments. Intelligent Systems, 33-47.
Crossref
H. Georg Schulze​‌. (2021) The Synthesis and Decoding of Meaning. Journal of Artificial General Intelligence 12:1, 26-70.
Online publication date: 22-Apr-2021.
Crossref
Yu Wang, Nima Roohi, Matthew West, Mahesh Viswanathan, Geir E. Dullerud​‌. (2020) Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning. 2020 59th IEEE Conference on Decision and Control (CDC), 1392-1397.
Crossref
Abhijit Gosavi​‌. (2020) The Actor-Critic Algorithm for Infinite Horizon Discounted Cost Revisited. 2020 Winter Simulation Conference (WSC), 2867-2878.
Crossref
Luca Caviglione, Mauro Gaggero, Massimo Paolucci, Roberto Ronco​‌. (2020) Deep reinforcement learning for multi-objective placement of virtual machines in cloud datacenters. Soft Computing 52.
Online publication date: 12-Dec-2020.
Crossref
Seifeddine Messaoud, Abbas Bradai, Syed Hashim Raza Bukhari, Pham Tran Anh Quang, Olfa Ben Ahmed, Mohamed Atri​‌. (2020) A survey on machine learning in Internet of Things: Algorithms, strategies, and applications. Internet of Things 12, 100314.
Online publication date: 1-Dec-2020.
Crossref
Mamoon Rashid, Harjeet Singh, Vishal Goyal​‌. (2020) The use of machine learning and deep learning algorithms in functional magnetic resonance imaging—A systematic review. Expert Systems 37:6.
Online publication date: 15-Oct-2020.
Crossref
George Sidiropoulos, Chairi Kiourt, Lefteris Moussiades​‌. (2020) Crowd simulation for crisis management: The outcomes of the last decade. Machine Learning with Applications 2, 100009.
Online publication date: 1-Dec-2020.
Crossref
Tatsuya Iwase, Aurelie Beynier, Nicolas Bredeche, Nicolas Maudet​‌. (2020) A game theoretical approach to self-assembly in swarm robotics. 2020 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), 90-97.
Crossref
Suvam Mukherjee, Pantazis Deligiannis, Arpita Biswas, Akash Lal​‌. (2020) Learning-based controlled concurrency testing. Proceedings of the ACM on Programming Languages 4:OOPSLA, 1-31.
Online publication date: 13-Nov-2020.
Crossref
Floris den Hengst, Eoin Martino Grua, Ali el Hassouni, Mark Hoogendoorn, Izabela Moise​‌. (2020) Reinforcement learning for personalization: A systematic literature review. Data Science 3:2, 107-147.
Online publication date: 11-Nov-2020.
Crossref
Jia Wu, ZiYan Li​‌. (2020) Hierarchical Joint Control for Urban Mixed-Autonomy Traffic Optimization. 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), 700-705.
Crossref
Ali Asghari, Mohammad Karim Sohrabi, Farzin Yaghmaee​‌. (2020) Online scheduling of dependent tasks of cloud’s workflows to enhance resource utilization and reduce the makespan using multiple reinforcement learning-based agents. Soft Computing 24:21, 16177-16199.
Online publication date: 24-Apr-2020.
Crossref
Wenshi Chen, Bowen Zhang, Mingyu Lu​‌. (2020) Uncertainty quantification for multilabel text classification. WIREs Data Mining and Knowledge Discovery 10:6.
Online publication date: 24-Aug-2020.
Crossref
Soroor Malekmohamadi Faradonbe, Faramarz Safi-Esfahani, Morteza Karimian-kelishadrokhi​‌. (2020) A Review on Neural Turing Machine (NTM). SN Computer Science 1:6.
Online publication date: 10-Oct-2020.
Crossref
Marco Stang, Daniel Grimm, Moritz Gaiser, Eric Sax​‌. (2020) Evaluation of Deep Reinforcement Learning Algorithms for Autonomous Driving. 2020 IEEE Intelligent Vehicles Symposium (IV), 1576-1582.
Crossref
Woojin Seol, Youngjun Jeon, Kyungsoo Kim, Soohyun Kim​‌. (2020) Legged balance on moving table by reinforcement learning. 2020 20th International Conference on Control, Automation and Systems (ICCAS), 900-905.
Crossref
Dileep Kalathil, Vivek S. Borkar, Rahul Jain​‌. (2020) Empirical Q-Value Iteration. Stochastic Systems 40.
Online publication date: 9-Oct-2020.
Crossref
Xiaolong Li, Hong Zheng, Chuanzhao Han, Haibo Wang, Kaihan Dong, Ying Jing, Wentao Zheng​‌. (2020) Cloud Detection of SuperView-1 Remote Sensing Images Based on Genetic Reinforcement Learning. Remote Sensing 12:19, 3190.
Online publication date: 29-Sep-2020.
Crossref
Insoon Yang​‌. (2020) A Convex Optimization Approach to Dynamic Programming in Continuous State and Action Spaces. Journal of Optimization Theory and Applications 187:1, 133-157.
Online publication date: 14-Sep-2020.
Crossref
Ailiya, Wei Yi, Ye Yuan​‌. (2020) Reinforcement Learning-Based Joint Adaptive Frequency Hopping and Pulse-Width Allocation for Radar anti-Jamming. 2020 IEEE Radar Conference (RadarConf20), 1-6.
Crossref
Georgios D. Kontes, Daniel D. Scherer, Tim Nisslbeck, Janina Fischer, Christopher Mutschler​‌. (2020) High-Speed Collision Avoidance using Deep Reinforcement Learning and Domain Randomization for Autonomous Vehicles. 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC), 1-8.
Crossref
Abrar Hassan Alhazmi, Nojood Aljehane​‌. (2020) A Survey Of Credit Card Fraud Detection Use Machine Learning. 2020 International Conference on Computing and Information Technology (ICCIT-1441), 1-6.
Crossref
Kamal M. Othman, Ahmad B. Rad​‌. (2020) Sequential Localizing and Mapping: A Navigation Strategy via Enhanced Subsumption Architecture. Sensors 20:17, 4815.
Online publication date: 26-Aug-2020.
Crossref
Laszlo Szoke, Szilard Aradi, Tamas Becsi, Peter Gaspar​‌. (2020) Vehicle Control in Highway Traffic by Using Reinforcement Learning and Microscopic Traffic Simulation. 2020 IEEE 18th International Symposium on Intelligent Systems and Informatics (SISY), 21-26.
Crossref
Taha Mostafaie, Farzin Modarres Khiyabani, Nima Jafari Navimipour​‌. (2020) A systematic study on meta-heuristic approaches for solving the graph coloring problem. Computers & Operations Research 120, 104850.
Online publication date: 1-Aug-2020.
Crossref
Jayakumar Subramanian, Aditya Mahajan​‌. (2020) Renewal Monte Carlo: Renewal Theory-Based Reinforcement Learning. IEEE Transactions on Automatic Control 65:8, 3663-3670.
Online publication date: 1-Aug-2020.
Crossref
Sherif Abdelfattah, Kathryn Kasmarik, Jiankun Hu​‌. (2020) A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments. Adaptive Behavior 28:4, 273-292.
Online publication date: 15-Aug-2019.
Crossref
M. Bilkis, M. Rosati, R. Morral Yepes, J. Calsamiglia​‌. (2020) Real-time calibration of coherent-state receivers: Learning by trial and error. Physical Review Research 2:3.
Online publication date: 24-Aug-2020.
Crossref
Yi Yu, Lina Mroueh, Shuo Li, Michel Terre​‌. (2020) Multi-Agent Q-Learning Algorithm for Dynamic Power and Rate Allocation in LoRa Networks. 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications, 1-5.
Crossref
Tor Lattimore, Csaba Szepesvári​‌. 2020. Bandit Algorithms.
Crossref
Amandeep Singh Bhatia, Mandeep Kaur Saggi, Amit Sundas, Jatinder Ashta​‌. 2020. Reinforcement Learning. Machine Learning and Big Data, 281-303.
Crossref
M. Vidyasagar​‌. (2020) Recent Advances in Reinforcement Learning. 2020 American Control Conference (ACC), 4751-4756.
Crossref
Yaohua Shen, Mou Chen​‌. (2020) Reinforcement Learning Based Dynamic Inverse Attitude Control of Near-space Vehicle. 2020 39th Chinese Control Conference (CCC), 6972-6977.
Crossref
Luntong Li, Dazi Li, Tianheng Song​‌. (2020) Feature selection in deterministic policy gradient. The Journal of Engineering 2020:13, 403-406.
Online publication date: 27-Jul-2020.
Crossref
Niky Bruchon, Gianfranco Fenu, Giulio Gaio, Marco Lonza, Finn Henry O’Shea, Felice Andrea Pellegrino, Erica Salvato​‌. (2020) Basic Reinforcement Learning Techniques to Control the Intensity of a Seeded Free-Electron Laser. Electronics 9:5, 781.
Online publication date: 9-May-2020.
Crossref
Chieh-En Tsai, Jean Oh​‌. (2020) A Generative Approach for Socially Compliant Navigation. 2020 IEEE International Conference on Robotics and Automation (ICRA), 2160-2166.
Crossref
Neda Emami, Parvin Samadi Pakchin, Reza Ferdousi​‌. (2020) Computational predictive approaches for interaction and structure of aptamers. Journal of Theoretical Biology, 110268.
Online publication date: 1-Apr-2020.
Crossref
Umberto Junior Mele, Xiaochen Chou, Luca Maria Gambardella, Roberto Montemanni​‌. (2020) Reinforcement Learning and Additional Rewardsfor the Traveling Salesman Problem. 2020 IEEE 7th International Conference on Industrial Engineering and Applications (ICIEA), 170-176.
Crossref
Yalin Liu, Hong-Ning Dai, Qubeijian Wang, Mahendra K. Shukla, Muhammad Imran​‌. (2020) Unmanned aerial vehicle for internet of everything: Opportunities and challenges. Computer Communications 155, 66-83.
Online publication date: 1-Apr-2020.
Crossref
Soham Gadgil, Yunfeng Xin, Chengzhe Xu​‌. (2020) Solving The Lunar Lander Problem under Uncertainty using Reinforcement Learning. 2020 SoutheastCon, 1-8.
Crossref
Xiongqing Liu, Yan Jin​‌. (2020) Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer. Artificial Intelligence for Engineering Design, Analysis and Manufacturing 3, 1-16.
Online publication date: 16-Mar-2020.
Crossref
Tianshu Chu, Jie Wang, Lara Codeca, Zhaojian Li​‌. (2020) Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control. IEEE Transactions on Intelligent Transportation Systems 21:3, 1086-1095.
Online publication date: 1-Mar-2020.
Crossref
Hyukjoon Kwon, Kee-Bong Song​‌. (2020) MIMO-OFDM Detector Selection using Reinforcement Learning. 2020 International Conference on Computing, Networking and Communications (ICNC), 347-352.
Crossref
Michele Compare, Luca Bellani, Enrico Cobelli, Enrico Zio, Francesco Annunziata, Fausto Carlevaro, Marzia Sepe​‌. (2020) A reinforcement learning approach to optimal part flow management for gas turbine maintenance. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 234:1, 52-62.
Online publication date: 19-Aug-2019.
Crossref
Hongliang Zhang, Lingyang Song, Zhu Han​‌. 2020. Basic Theoretical Background. Unmanned Aerial Vehicle Applications over Cellular Networks for 5G and Beyond, 27-60.
Crossref
Włodzimierz Funika, Paweł Koperek​‌. 2020. Evaluating the Use of Policy Gradient Optimization Approach for Automatic Cloud Resource Provisioning. Parallel Processing and Applied Mathematics, 467-478.
Crossref
Matthew F. Dixon, Igor Halperin, Paul Bilokon​‌. 2020. , 279.
Crossref
Haodi Zhang, Di Zhan, Chen Jason Zhang, Kaishun Wu, Ye Liu, Sheng Luo​‌. (2020) Deep Reinforcement Learning-Based Access Control for Buffer-Aided Relaying Systems With Energy Harvesting. IEEE Access 8, 145006-145017.
Online publication date: 1-Jan-2020.
Crossref
Daniil Ryabko​‌. 2020. Conclusion and Outlook. Universal Time-Series Forecasting with Mixture Predictors, 77-82.
Crossref
Nasim Alamdari, Edward Lobarinas, Nasser Kehtarnavaz​‌. (2020) Personalization of Hearing Aid Compression by Human-in-the-Loop Deep Reinforcement Learning. IEEE Access 8, 203503-203515.
Online publication date: 1-Jan-2020.
Crossref
Ziniu Li, Xiong-Hui Chen​‌. 2020. Efficient Exploration by Novelty-Pursuit. Distributed Artificial Intelligence, 85-102.
Crossref
Faizan Rasheed, Kok-Lim Alvin Yau, Rafidah Md. Noor, Celimuge Wu, Yeh-Ching Low​‌. (2020) Deep Reinforcement Learning for Traffic Signal Control: A Review. IEEE Access 8, 208016-208044.
Online publication date: 1-Jan-2020.
Crossref
Алексей Владимирович Мясников​‌. (2020) Применение машинного обучения с подкреплением в задаче тестирования на проникновение. Естественные и Технические Науки:№11, 104-107.
Online publication date: 1-Jan-2020.
Crossref
Yan Xiong, Liang Guo, Yong Huang, Liheng Chen​‌. (2020) Intelligent Thermal Control Strategy Based on Reinforcement Learning for Space Telescope. Journal of Thermophysics and Heat Transfer 34:1, 37-44.
Online publication date: 1-Jan-2020.
Crossref
Vinicius Lima, Mark Eisen, Konstatinos Gatsis, Alejandro Ribeiro​‌. (2020) Resource Allocation in Large-Scale Wireless Control Systems with Graph Neural Networks. IFAC-PapersOnLine 53:2, 2634-2641.
Online publication date: 1-Jan-2020.
Crossref
Ningqi Wang, Chang Niu, Zizhong Li​‌. (2019) RLayout: Interior Design System Based on Reinforcement Learning. 2019 12th International Symposium on Computational Intelligence and Design (ISCID), 117-120.
Crossref
Pablo Hernandez-Leal, Bilal Kartal, Matthew E. Taylor​‌. (2019) A survey and critique of multiagent deep reinforcement learning. Autonomous Agents and Multi-Agent Systems 33:6, 750-797.
Online publication date: 16-Oct-2019.
Crossref
Wei Yuan, Ming Yang, Yuesheng He, Chunxiang Wang, Bing Wang​‌. (2019) Multi-Reward Architecture based Reinforcement Learning for Highway Driving Policies. 2019 IEEE Intelligent Transportation Systems Conference (ITSC), 3810-3815.
Crossref
Niky Bruchon, Gianfranco Fenu, Giulio Gaio, Marco Lonza, Felice Andrea Pellegrino, Erica Salvato​‌. (2019) Toward the Application of Reinforcement Learning to the Intensity Control of a Seeded Free-Electron Laser. 2019 23rd International Conference on Mechatronics Technology (ICMT), 1-6.
Crossref
Riccardo Porotti, Dario Tamascelli, Marcello Restelli, Enrico Prati​‌. (2019) Reinforcement Learning Based Control of Coherent Transport by Adiabatic Passage of Spin Qubits. Journal of Physics: Conference Series 1275, 012019.
Online publication date: 17-Sep-2019.
Crossref
Adithya M. Devraj, Ana Busiz, Sean Meyn​‌. (2019) On Matrix Momentum Stochastic Approximation and Applications to Q-learning. 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 749-756.
Crossref
Thinh T. Doan, Justin Romberg​‌. (2019) Linear Two-Time-Scale Stochastic Approximation A Finite-Time Analysis. 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), 399-406.
Crossref
Tian-yang Zhou, Yi-chao Zang, Jun-hu Zhu, Qing-xian Wang​‌. (2019) NIG-AP: a new method for automated penetration testing. Frontiers of Information Technology & Electronic Engineering 20:9, 1277-1288.
Online publication date: 18-Oct-2019.
Crossref
Chee Keong Wee, Richi Nayak​‌. (2019) Adaptive load forecasting using reinforcement learning with database technology. Journal of Information and Telecommunication 3:3, 381-399.
Online publication date: 26-Mar-2019.
Crossref
Chee Keong Wee, Richi Nayak​‌. (2019) A novel machine learning approach for database exploitation detection and privilege control. Journal of Information and Telecommunication 3:3, 308-325.
Online publication date: 28-Jan-2019.
Crossref
Yipeng Pang, Guoqiang Hu​‌. (2019) A Distributed Optimization Method with Unknown Cost Function in Multi-Agent Systems via a Learning-Based Method. 2019 IEEE 15th International Conference on Control and Automation (ICCA), 1050-1055.
Crossref
Hanan Al-Tous, Imad Barhumi​‌. (2019) Distributed Reinforcement Learning Algorithm for Energy Harvesting Sensor Networks. 2019 IEEE International Black Sea Conference on Communications and Networking (BlackSeaCom), 1-3.
Crossref
Mahmoud El Chamie, Dylan Janak, Behçet Açıkmeşe​‌. (2019) Markov decision processes with sequential sensor measurements. Automatica 103, 450-460.
Online publication date: 1-May-2019.
Crossref
Christopher Chembe, Douglas Kunda, Ismail Ahmedy, Rafidah Md Noor, Aznul Qalid Md Sabri, Md Asri Ngadi​‌. (2019) Infrastructure based spectrum sensing scheme in VANET using reinforcement learning. Vehicular Communications, 100161.
Online publication date: 1-May-2019.
Crossref
R. Rocchetta, L. Bellani, M. Compare, E. Zio, E. Patelli​‌. (2019) A reinforcement learning framework for optimal operation and maintenance of power grids. Applied Energy 241, 291-301.
Online publication date: 1-May-2019.
Crossref
Antonio Massaro, Francesco De Pellegrini, Lorenzo Maggi​‌. (2019) Optimal Trunk-Reservation by Policy Learning. IEEE INFOCOM 2019 - IEEE Conference on Computer Communications, 127-135.
Crossref
Ekaterina Sinitskaya, Kelley J. Gomez, Qifang Bao, Maria C. Yang, Erin F. MacDonald​‌. (2019) Examining the Influence of Solar Panel Installers on Design Innovation and Market Penetration. Journal of Mechanical Design 141:4.
Online publication date: 11-Jan-2019.
Crossref
Michael R. Kosorok, Eric B. Laber​‌. (2019) Precision Medicine. Annual Review of Statistics and Its Application 6:1, 263-286.
Online publication date: 7-Mar-2019.
Crossref
Fang Wang, Renjun Feng, Haiyan Chen​‌. (2019) Dynamic Routing Algorithm with Q-learning for Internet of things with Delayed Estimator. IOP Conference Series: Earth and Environmental Science 234, 012048.
Online publication date: 8-Mar-2019.
Crossref
Hoai An Le Thi, Vinh Thanh Ho, Tao Pham Dinh​‌. (2019) A unified DC programming framework and efficient DCA based approaches for large scale batch reinforcement learning. Journal of Global Optimization 73:2, 279-310.
Online publication date: 21-Aug-2018.
Crossref
Bo Pang, Tao Bian, Zhong-Ping Jiang​‌. (2019) Adaptive dynamic programming for finite-horizon optimal control of linear time-varying discrete-time systems. Control Theory and Technology 17:1, 73-84.
Online publication date: 25-Jan-2019.
Crossref
Judith Fechter, Andreas Beham, Stefan Wagner, Michael Affenzeller​‌. (2019) Approximate Q-Learning for Stacking Problems with Continuous Production and Retrieval. Applied Artificial Intelligence 33:1, 68-86.
Online publication date: 2-Nov-2018.
Crossref
Gerardo Beruvides​‌. 2019. Artificial Cognitive Architecture. Design and Implementation. Artificial Cognitive Architecture with Self-Learning and Self-Optimization Capabilities, 113-153.
Crossref
Stephane R. A. Barde, Soumaya Yacout, Hayong Shin​‌. (2019) Optimal preventive maintenance policy based on reinforcement learning of a fleet of military trucks. Journal of Intelligent Manufacturing 30:1, 147-161.
Online publication date: 18-Jun-2016.
Crossref
Ilya O. Ryzhov, Martijn R. K. Mes, Warren B. Powell, Gerald van den Berg​‌. (2019) Bayesian Exploration for Approximate Dynamic Programming. Operations Research 67:1, 198-214.
Online publication date: 1-Jan-2019.
Crossref
Chengbo Wang, Xinyu Zhang, Ruijie Li, Peifang Dong​‌. 2019. Path Planning of Maritime Autonomous Surface Ships in Unknown Environment with Reinforcement Learning. Cognitive Systems and Signal Processing, 127-137.
Crossref
Edirlei Soares de Lima, Bruno Feijó​‌. 2019. Artificial Intelligence in Human-Robot Interaction. Emotional Design in Human-Robot Interaction, 187-199.
Crossref
Ala'eddin Masadeh, Zhengdao Wang, Ahmed E. Kamal​‌. (2019) , 1.
Crossref
Mbazingwa Elirehema Mkiramweni, Chungang Yang, Jiandong Li, Wei Zhang​‌. (2019) A Survey of Game Theory in Unmanned Aerial Vehicles Communications. IEEE Communications Surveys & Tutorials 21:4, 3386-3416.
Online publication date: 1-Jan-2019.
Crossref
Niloofar Khanzhahi, Behrouz Masoumi, Babak Karasfi​‌. (2018) Deep Reinforcement Learning Issues and Approaches for The Multi-Agent Centric Problems. 2018 9th Conference on Artificial Intelligence and Robotics and 2nd Asia-Pacific International Symposium, 87-95.
Crossref
Michele Compare, Luca Bellani, Enrico Cobelli, Enrico Zio​‌. (2018) Reinforcement learning-based flow management of gas turbine parts under stochastic failures. The International Journal of Advanced Manufacturing Technology 99:9-12, 2981-2992.
Online publication date: 18-Sep-2018.
Crossref
Ashkan Ertefaie, Robert L Strawderman​‌. (2018) Constructing dynamic treatment regimes over indefinite time horizons. Biometrika 105:4, 963-977.
Online publication date: 17-Sep-2018.
Crossref
Jayakumar Subramanian, Aditya Mahajan​‌. (2018) Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning. 2018 IEEE Conference on Decision and Control (CDC), 5759-5764.
Crossref
Riad Akrour, Filipe Veiga, Jan Peters, Gerhard Neumann​‌. (2018) Regularizing Reinforcement Learning with State Abstraction. 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 534-539.
Crossref
Xiaosha Chen, Supeng Leng, Fan Wu​‌. (2018) Reinforcement Learning Based Safety Message Broadcasting in Vehicular Networks. 2018 10th International Conference on Wireless Communications and Signal Processing (WCSP), 1-6.
Crossref
Jean-Pierre Leduc, Khan M. Iftekharuddin, Abdul A. S. Awwal, Andrés Márquez, Mireya García Vázquez, Víctor H. Diaz-Ramirez​‌. (2018) 3D+T motion analysis: motion sensor network versus multiple video cameras. Optics and Photonics for Information Processing XII, 33.
Crossref
Zahra Shamsi, Kevin J. Cheng, Diwakar Shukla​‌. (2018) Reinforcement Learning Based Adaptive Sampling: REAPing Rewards by Exploring Protein Conformational Landscapes. The Journal of Physical Chemistry B 122:35, 8386-8395.
Online publication date: 20-Aug-2018.
Crossref
Pouria Tooranjipour, Ramin Vatankhah​‌. (2018) Adaptive critic-based quaternion neuro-fuzzy controller design with application to chaos control. Applied Soft Computing 70, 622-632.
Online publication date: 1-Sep-2018.
Crossref
Zhiyuan Chen, Bing Liu​‌. (2018) Lifelong Machine Learning, Second Edition. Synthesis Lectures on Artificial Intelligence and Machine Learning 12:3, 1-207.
Online publication date: 14-Aug-2018.
Crossref
Meghna Lowalekar, Pradeep Varakantham, Patrick Jaillet​‌. (2018) Online spatio-temporal matching in stochastic and dynamic domains. Artificial Intelligence 261, 71-112.
Online publication date: 1-Aug-2018.
Crossref
Ying Huang, GuoLiang Wei, YongXiong Wang​‌. (2018) V-D D3QN: the Variant of Double Deep Q-Learning Network with Dueling Architecture. 2018 37th Chinese Control Conference (CCC), 9130-9135.
Crossref
Yazhou Hu, Bailu Si​‌. (2018) A Reinforcement Learning Neural Network for Robotic Manipulator Control. Neural Computation 30:7, 1983-2004.
Online publication date: 1-Jul-2018.
Crossref
S. S. Emam, J. Miller​‌. (2018) Inferring Extended Probabilistic Finite-State Automaton Models from Software Executions. ACM Transactions on Software Engineering and Methodology 27:1, 1-39.
Online publication date: 5-Jun-2018.
Crossref
Jean-Pierre Leduc, M. Saif Islam, Achyut K. Dutta, Thomas George​‌. (2018) Sensor fusion for 3D+T motion detection and target tracking. Micro- and Nanotechnology Sensors, Systems, and Applications X, 92.
Crossref
Hossein Bayat-Yeganeh, Vahid Shah-Mansouri, Hamed Kebriaei​‌. (2018) A multi-state Q-learning based CSMA MAC protocol for wireless networks. Wireless Networks 24:4, 1251-1264.
Online publication date: 16-Nov-2016.
Crossref
Konstantin Bottinger, Patrice Godefroid, Rishabh Singh​‌. (2018) Deep Reinforcement Fuzzing. 2018 IEEE Security and Privacy Workshops (SPW), 116-122.
Crossref
Mouhacine Benosman​‌. (2018) Model-based vs data-driven adaptive control: An overview. International Journal of Adaptive Control and Signal Processing 32:5, 753-776.
Online publication date: 13-Mar-2018.
Crossref
Tommaso Mannucci, Erik-Jan van Kampen, Cornelis de Visser, Qiping Chu​‌. (2018) Safe Exploration Algorithms for Reinforcement Learning Controllers. IEEE Transactions on Neural Networks and Learning Systems 29:4, 1069-1081.
Online publication date: 1-Apr-2018.
Crossref
Eric B. Laber, Eric J. Rose, Marie Davidian, Anastasios A. Tsiatis​‌. 2018. Q ‐Learning. Wiley StatsRef: Statistics Reference Online, 1-10.
Crossref
Vinicius G. Goecks, Pedro B. Leal, Trent White, John Valasek, Darren J. Hartl​‌. (2018) Control of Morphing Wing Shapes with Deep Reinforcement Learning. 2018 AIAA Information Systems-AIAA Infotech @ Aerospace.
Crossref
Tian Tan, TianShu Chu, Bo Peng, Jie Wang​‌. 2018. Large-Scale Traffic Grid Signal Control Using Decentralized Fuzzy Reinforcement Learning. Proceedings of SAI Intelligent Systems Conference (IntelliSys) 2016, 652-662.
Crossref
Rushikesh Kamalapurkar, Patrick Walters, Joel Rosenfeld, Warren Dixon​‌. 2018. Computational Considerations. Reinforcement Learning for Optimal Feedback Control, 227-263.
Crossref
Rafał Dreżewski, Maciej Klęczar​‌. 2018. Artificial Intelligence Techniques for the Puerto Rico Strategy Game. Agent and Multi-Agent Systems: Technology and Applications, 77-87.
Crossref
Salma Kazemi Rashed, Reza Shahbazian, Seyed Ali Ghorashi​‌. (2018) Learning-based resource allocation in D2D communications with QoS and fairness considerations. Transactions on Emerging Telecommunications Technologies 29:1, e3249.
Crossref
Sai Sreewathsa Kovalluri, Aravind Ashok, Hareesh Singanamala, Prabaharan P.​‌. (2018) LSTM Based Self-Defending AI Chatbot Providing Anti-Phishing. Proceedings of the First Workshop on Radical and Experiential Security - RESEC '18, 49-56.
Crossref
Mahmoud El Chamie, Behcet Acikmese, Mehran Mesbahi​‌. (2017) Online learning for Markov decision processes applied to multi-agent systems. 2017 IEEE 56th Annual Conference on Decision and Control (CDC), 1596-1601.
Crossref
Ion Matei, Raj Minhas, Johan de Kleer, Anurag Ganguli​‌. (2017) Improving state-action space exploration in reinforcement learning using geometric properties. 2017 IEEE 56th Annual Conference on Decision and Control (CDC), 6403-6408.
Crossref
Daniel J. Lizotte​‌. 2017. Reinforcement Learning. Wiley StatsRef: Statistics Reference Online, 1-9.
Crossref
Phillip Smith, Robert Hunjed, Aldeida Aleti, Jan Carlo Barca​‌. (2017) Adaptive data transfer methods via policy evolution for UAV swarms. 2017 27th International Telecommunication Networks and Applications Conference (ITNAC), 1-8.
Crossref
Angel Martínez-Tenor, Juan Antonio Fernández-Madrigal, Ana Cruz-Martín, Javier González-Jiménez​‌. (2017) Towards a common implementation of reinforcement learning for multiple robotic tasks. Expert Systems with Applications.
Online publication date: 1-Nov-2017.
Crossref
Jean-Pierre Leduc, Andrew G. Tescher​‌. (2017) 3D+T motion analysis with nanosensors. Applications of Digital Image Processing XL, 59.
Crossref
Ryan Gary Kim, Wonje Choi, Zhuo Chen, Janardhan Rao Doppa, Partha Pratim Pande, Diana Marculescu, Radu Marculescu​‌. (2017) Imitation Learning for Dynamic VFI Control in Large-Scale Manycore Systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25:9, 2458-2471.
Online publication date: 1-Sep-2017.
Crossref
Rongkuan Tang, Hongliang Yuan​‌. (2017) Cyclic error correction based Q-learning for mobile robots navigation. International Journal of Control, Automation and Systems 15:4, 1790-1798.
Online publication date: 27-Jun-2017.
Crossref
Jianhui Han, Huaping Liu, Bowen Wang​‌. (2017) A deep Q network for robotic planning from image. 2017 2nd International Conference on Advanced Robotics and Mechatronics (ICARM), 626-631.
Crossref
Jungkyu Lee, Byonghwa Oh, Jihoon Yang, Unsang Park​‌. (2017) RLCF: A collaborative filtering approach based on reinforcement learning with sequential ratings. Intelligent Automation & Soft Computing 23:3, 439-444.
Online publication date: 30-Sep-2016.
Crossref
Tianshu Chu, Jie Wang​‌. (2017) Traffic signal control by distributed Reinforcement Learning with min-sum communication. 2017 American Control Conference (ACC), 5095-5100.
Crossref
Heriberto Cuayáhuitl​‌. 2017. SimpleDS: A Simple Deep Reinforcement Learning Dialogue System. Dialogues with Social Robots, 109-118.
Crossref
Thomas R. Colin, Tony Belpaeme, Angelo Cangelosi, Nikolas Hemion​‌. (2016) Hierarchical reinforcement learning as creative problem solving. Robotics and Autonomous Systems 86, 196-206.
Online publication date: 1-Dec-2016.
Crossref
, . (2016) Lifelong Machine Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning 10:3, 1-145.
Online publication date: 7-Nov-2016.
Abstract | PDF (1052 KB) | PDF Plus (1057 KB) 
Amir Hosein Keyhanipour, Behzad Moshiri, Maryam Piroozmand, Farhad Oroumchian, Ali Moeini​‌. (2016) Learning to rank with click-through features in a reinforcement learning framework. International Journal of Web Information Systems 12:4, 448-476.
Online publication date: 7-Nov-2016.
Crossref
Chun Wei, Zhe Zhang, Wei Qiao, Liyan Qu​‌. (2016) An Adaptive Network-Based Reinforcement Learning Method for MPPT Control of PMSG Wind Energy Conversion Systems. IEEE Transactions on Power Electronics 31:11, 7837-7848.
Online publication date: 1-Nov-2016.
Crossref
Ming-Hsiang Su, Kun-Yi Huang, Tsung-Hsien Yang, Kuan-Jung Lai, Chung-Hsien Wu​‌. (2016) Dialog State Tracking and action selection using deep learning mechanism for interview coaching. 2016 International Conference on Asian Language Processing (IALP), 6-9.
Crossref
Warren B. Powell​‌. 2016. A Unified Framework for Optimization Under Uncertainty. Optimization Challenges in Complex, Networked and Risky Systems, 45-83.
Crossref
Jun-Kun Wang, Shou-De Lin​‌. (2016) Parallel Least-Squares Policy Iteration. 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 166-173.
Crossref
Shuhui Qu, Jie Wang, Govil Shivani​‌. (2016) Learning adaptive dispatching rules for a manufacturing process system by using reinforcement learning approach. 2016 IEEE 21st International Conference on Emerging Technologies and Factory Automation (ETFA), 1-8.
Crossref
Antonio Fernandez Anta, Chryssis Georgiou, Miguel A. Mosteiro, Daniel Pareja​‌. (2016) Multi-round Master-Worker Computing: A Repeated Game Approach. 2016 IEEE 35th Symposium on Reliable Distributed Systems (SRDS), 31-40.
Crossref
Sreedhar S. Kumar, Jan Wülfing, Samora Okujeni, Joschka Boedecker, Martin Riedmiller, Ulrich Egert, Saad Jbabdi​‌. (2016) Autonomous Optimization of Targeted Stimulation of Neuronal Networks. PLOS Computational Biology 12:8, e1005054.
Online publication date: 10-Aug-2016.
Crossref
Elena Daskalaki, Peter Diem, Stavroula G. Mougiakakou, Kathrin Maedler​‌. (2016) Model-Free Machine Learning in Biomedicine: Feasibility Study in Type 1 Diabetes. PLOS ONE 11:7, e0158722.
Online publication date: 21-Jul-2016.
Crossref
Mahmoud El Chamie, Yue Yu, Behcet Acikmese​‌. (2016) Convex synthesis of randomized policies for controlled Markov chains with density safety upper bound constraints. 2016 American Control Conference (ACC), 6290-6295.
Crossref
Tianshu Chu, Shuhui Qu, Jie Wang​‌. (2016) Large-scale traffic grid signal control with regional Reinforcement Learning. 2016 American Control Conference (ACC), 815-820.
Crossref
Mahmoud El Chamie, Behcet Acikmese​‌. (2016) Convex synthesis of optimal policies for Markov Decision Processes with sequentially-observed transitions. 2016 American Control Conference (ACC), 3862-3867.
Crossref
Andreas Holzinger​‌. (2016) Interactive machine learning for health informatics: when do we need the human-in-the-loop?. Brain Informatics 3:2, 119-131.
Online publication date: 2-Mar-2016.
Crossref
Georgia D. Tourassi, Samuel G. Armato, Tianshu Chu, Jie Wang, Jiayu Chen​‌. (2016) An adaptive online learning framework for practical breast cancer diagnosis. , 978524.
Crossref
Junaid Qadir​‌. (2016) Artificial intelligence based cognitive routing for cognitive radio networks. Artificial Intelligence Review 45:1, 25-96.
Online publication date: 3-Sep-2015.
Crossref
Evgenia Christoforou, Antonio Fernández Anta, Chryssis Georgiou, Miguel A. Mosteiro​‌. 2016. Internet Computing: Using Reputation to Select Workers from a Pool. Networked Systems, 137-153.
Crossref
Seyedeh Sepideh Emam, James Miller​‌. (2015) Test Case Prioritization Using Extended Digraphs. ACM Transactions on Software Engineering and Methodology 25:1, 1-41.
Online publication date: 2-Dec-2015.
Crossref
Rodolfo E. Haber, Carmelo Juanes, Raúl del Toro, Gerardo Beruvides​‌. (2015) Artificial cognitive control with self-x capabilities: A case study of a micro-manufacturing process. Computers in Industry 74, 135-150.
Online publication date: 1-Dec-2015.
Crossref
João Cunha, Rui Serra, Nuno Lau, Luís Seabra Lopes, Antóio J. R. Neves​‌. (2015) Batch Reinforcement Learning for Robotic Soccer Using the Q-Batch Update-Rule. Journal of Intelligent & Robotic Systems 80:3-4, 385-399.
Online publication date: 9-Jan-2015.
Crossref
Chun Wei, Zhe Zhang, Wei Qiao, Liyan Qu​‌. (2015) Reinforcement-Learning-Based Intelligent Maximum Power Point Tracking Control for Wind Energy Conversion Systems. IEEE Transactions on Industrial Electronics 62:10, 6360-6370.
Online publication date: 1-Oct-2015.
Crossref
Matthew Emigh, Evan Kriminger, Jose C. Principe​‌. (2015) A model based approach to exploration of continuous-state MDPs using Divergence-to-Go. 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP), 1-6.
Crossref
Peter Nesbitt, Quinn Kennedy, Jonathan K. Alt, Ron Fricker​‌. (2015) Iowa Gambling Task Modified for Military Domain. Military Psychology 27:4, 252-260.
Online publication date: 13-Dec-2017.
Crossref
NINA DETHLEFS, HERIBERTO CUAYÁHUITL​‌. (2015) Hierarchical reinforcement learning for situated natural language generation. Natural Language Engineering 21:3, 391-435.
Online publication date: 10-Jan-2014.
Crossref
Robert Bauer, Alireza Gharabaghi​‌. (2015) Reinforcement learning for adaptive threshold control of restorative brain-computer interfaces: a Bayesian simulation. Frontiers in Neuroscience 9.
Online publication date: 12-Feb-2015.
Crossref
Youssef Khaoula, P. Ravindra De Silva, Michio Okada​‌. 2015. SDT: Maintaining the Communication Protocol Through Mixed Feedback Strategies. Social Robotics, 348-358.
Crossref
Gabriel Kronberger, Michael Kommenda, Stephan Winkler, Michael Affenzeller​‌. 2015. Using Contextual Information in Sequential Search for Grammatical Optimization Problems. Computer Aided Systems Theory – EUROCAST 2015, 417-424.
Crossref
Tianshu Chu, Jie Wang, Jian Cao​‌. (2014) Kernel-based reinforcement learning for traffic signal control with adaptive feature selection. 53rd IEEE Conference on Decision and Control, 1277-1282.
Crossref
Hengshuai Yao, Csaba Szepesvari, Bernardo Avila Pires, Xinhua Zhang​‌. (2014) Pseudo-MDPs and factored linear action models. 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 1-9.
Crossref
Yuki Tezuka, Akira Notsu, Katsuhiro Honda​‌. (2014) Utility of Turning Spot Learning under complex goal search and the limit of memory usage. 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS), 1418-1423.
Crossref
Róbert Busa-Fekete, Balázs Szörényi, Paul Weng, Weiwei Cheng, Eyke Hüllermeier​‌. (2014) Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm. Machine Learning 97:3, 327-351.
Online publication date: 2-Jul-2014.
Crossref
Hung Ngo, Matthew Luciw, Jawas Nagi, Alexander Forster, Jürgen Schmidhuber, Ngo Anh Vien​‌. (2014) Efficient Interactive Multiclass Learning from Binary Feedback. ACM Transactions on Interactive Intelligent Systems 4:3, 1-25.
Online publication date: 21-Nov-2014.
Crossref
Heriberto Cuayáhuitl, Ivana Kruijff-Korbayová, Nina Dethlefs​‌. (2014) Nonstrict Hierarchical Reinforcement Learning for Interactive Systems and Robots. ACM Transactions on Interactive Intelligent Systems 4:3, 1-30.
Online publication date: 21-Nov-2014.
Crossref
Harm van Seijen, Shimon Whiteson, Leon Kester​‌. (2014) EFFICIENT ABSTRACTION SELECTION IN REINFORCEMENT LEARNING. Computational Intelligence 30:4, 657-699.
Online publication date: 6-Aug-2013.
Crossref
Joan Marc Llargues Asensio, Juan Peralta, Raul Arrabales, Manuel Gonzalez Bedia, Paulo Cortez, Antonio Lopez Peña​‌. (2014) Artificial Intelligence approaches for the generation and assessment of believable human-like behaviour in virtual characters. Expert Systems with Applications 41:16, 7281-7290.
Online publication date: 1-Nov-2014.
Crossref
Majeed Pooyandeh, Danielle J. Marceau​‌. (2014) Incorporating Bayesian learning in agent-based simulation of stakeholders’ negotiation. Computers, Environment and Urban Systems 48, 73-85.
Online publication date: 1-Nov-2014.
Crossref
Ngo Anh Vien, Hung Ngo, Sungyoung Lee, TaeChoong Chung​‌. (2014) Approximate planning for bayesian hierarchical reinforcement learning. Applied Intelligence 41:3, 808-819.
Online publication date: 20-Jul-2014.
Crossref
Chun Wei, Zhe Zhang, Wei Qiao, Liyan Qu​‌. (2014) Intelligent maximum power extraction control for wind energy conversion systems based on online Q-learning with function approximation. 2014 IEEE Energy Conversion Congress and Exposition (ECCE), 4911-4916.
Crossref
Pablo Escandell-Montero, Milena Chermisi, José M. Martínez-Martínez, Juan Gómez-Sanchis, Carlo Barbieri, Emilio Soria-Olivas, Flavio Mari, Joan Vila-Francés, Andrea Stopper, Emanuele Gatti, José D. Martín-Guerrero​‌. (2014) Optimization of anemia treatment in hemodialysis patients via reinforcement learning. Artificial Intelligence in Medicine 62:1, 47-60.
Online publication date: 1-Sep-2014.
Crossref
Xinxi Wang, Yi Wang, David Hsu, Ye Wang​‌. (2014) Exploration in Interactive Personalized Music Recommendation. ACM Transactions on Multimedia Computing, Communications, and Applications 11:1, 1-22.
Online publication date: 1-Aug-2014.
Crossref
Adam Taylor, Ivana Dusparic, Edgar Galvan-Lopez, Siobhan Clarke, Vinny Cahill​‌. (2014) Accelerating Learning in multi-objective systems through Transfer Learning. 2014 International Joint Conference on Neural Networks (IJCNN), 2298-2305.
Crossref
Fei Zhu, Quan Liu, Yuchen Fu, Bairong Shen, Yang Zhang​‌. (2014) Segmentation of Neuronal Structures Using SARSA (λ)-Based Boundary Amendment with Reinforced Gradient-Descent Curve Shape Fitting. PLoS ONE 9:3, e90873.
Online publication date: 13-Mar-2014.
Crossref
Xin Xu, Cong Wang, Frank L. Lewis​‌. (2014) Some recent advances in learning and adaptation for uncertain feedback control systems. International Journal of Adaptive Control and Signal Processing 28:3-5, 201-204.
Online publication date: 19-Mar-2014.
Crossref
Lauren A. Hannah, Warren B. Powell, David B. Dunson​‌. (2014) Semiconvex Regression for Metamodeling-Based Optimization. SIAM Journal on Optimization 24:2, 573-597.
Online publication date: 1-Jan-2014.
Crossref
Elena Daskalaki, Peter Diem, Stavroula Mougiakakou​‌. 2014. Adaptive Algorithms for Personalized Diabetes Treatment. Data-driven Modeling for Diabetes, 91-116.
Crossref
Qingshan Li, Hua Chu, Liang Diao, Lu Wang​‌. 2014. Adaptive Mechanism Based on Shared Learning in Multi-agent System. Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 113-121.
Crossref
Yu HOSOYA, Motohide UMANO​‌. (2014) Dynamic Fuzzy Q-Learning with Facilities of Tuning States and Removing Pairs of State and Actions. Journal of Japan Society for Fuzzy Theory and Intelligent Informatics 26:5, 844-854.
Online publication date: 1-Jan-2014.
Crossref
Vivek S. Borkar, Adwaitvedant S. Mathkar​‌. 2014. Reinforcement Learning for Matrix Computations: PageRank as an Example. Distributed Computing and Internet Technology, 14-24.
Crossref
Eric B. Laber, Daniel J. Lizotte, Min Qian, William E. Pelham, Susan A. Murphy​‌. (2014) Dynamic treatment regimes: Technical challenges and applications. Electronic Journal of Statistics 8:1.
Online publication date: 1-Jan-2014.
Crossref
Evgenia Christoforou, Antonio Fernández Anta, Chryssis Georgiou, Miguel A. Mosteiro, Angel Sánchez​‌. (2013) Applying the dynamics of evolution to achieve reliability in master-worker computing. Concurrency and Computation: Practice and Experience 25:17, 2363-2380.
Online publication date: 1-Aug-2013.
Crossref
Ali Mahmoodi, Dan Bang, Majid Nili Ahmadabadi, Bahador Bahrami, Stephen C. Pratt​‌. (2013) Learning to Make Collective Decisions: The Impact of Confidence Escalation. PLoS ONE 8:12, e81195.
Online publication date: 6-Dec-2013.
Crossref
Juliana Nascimento, Warren B. Powell​‌. (2013) An Optimal Approximate Dynamic Programming Algorithm for Concave, Scalar Storage Problems With Vector-Valued Controls. IEEE Transactions on Automatic Control 58:12, 2995-3010.
Online publication date: 1-Dec-2013.
Crossref
Boris Defourny, Damien Ernst, Louis Wehenkel​‌. (2013) Scenario Trees and Policy Selection for Multistage Stochastic Programming Using Machine Learning. INFORMS Journal on Computing 25:3, 488-501.
Online publication date: 1-Aug-2013.
Crossref
Markus Peters, Wolfgang Ketter, Maytal Saar-Tsechansky, John Collins​‌. (2013) A reinforcement learning approach to autonomous decision-making in smart electricity markets. Machine Learning 92:1, 5-39.
Online publication date: 9-Apr-2013.
Crossref
, . (2013) A Concise Introduction to Models and Methods for Automated Planning. Synthesis Lectures on Artificial Intelligence and Machine Learning 7:2, 1-141.
Online publication date: 1-Jul-2013.
Abstract | PDF (1144 KB) | PDF Plus (1148 KB) | Supplementary Material 
Muge Fesci Sayit, Yagiz Kaymak, Kemal Deniz Teket, Cihat Cetinkaya, Sercan Demirci, Geylani Kardas​‌. (2013) Parent Selection via Reinforcement Learning in Mesh-Based P2P Video Streaming. 2013 10th International Conference on Information Technology: New Generations, 546-551.
Crossref
Warren B. Powell, Ilya O. Ryzhov​‌. 2013. Optimal Learning and Approximate Dynamic Programming. Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, 410-431.
Crossref
Pedro Sequeira, Francisco S. Melo, Ana Paiva​‌. 2013. An Associative State-Space Metric for Learning in Factored MDPs. Progress in Artificial Intelligence, 163-174.
Crossref
Evgenia Christoforou, Antonio Fernández Anta, Chryssis Georgiou, Miguel A. Mosteiro, Angel Sánchez​‌. 2013. Reputation-Based Mechanisms for Evolutionary Master-Worker Computing. Principles of Distributed Systems, 98-113.
Crossref
Ivo Grondman, Lucian Busoniu, Gabriel A. D. Lopes, Robert Babuska​‌. (2012) A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42:6, 1291-1307.
Online publication date: 1-Nov-2012.
Crossref
A. Papangelis, V. Karkaletsis, F. Makedon​‌. (2012) Online Complex Action Learning and User State Estimation for Adaptive Dialogue Systems. 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, 642-649.
Crossref
Akira Notsu, Katsuhiro Honda, Hidetomo Ichihashi, Ayaka Ido, Yuki Komori​‌. (2012) Information compression effect based on PCA for reinforcement learning agents' communication. The 6th International Conference on Soft Computing and Intelligent Systems, and The 13th International Symposium on Advanced Intelligence Systems, 1318-1321.
Crossref
Urbain Prieur, Veronique Perdereau, Alexandre Bernardino​‌. (2012) Modeling and planning high-level in-hand manipulation actions from human knowledge and active learning from demonstration. 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 1330-1336.
Crossref
Louis Anthony Tony Cox​‌. (2012) Confronting Deep Uncertainties in Risk Analysis. Risk Analysis 32:10, 1607-1629.
Online publication date: 10-Apr-2012.
Crossref
Jeffrey D. Rieker, John W. Labadie​‌. (2012) An intelligent agent for optimal river‐reservoir system management. Water Resources Research 48:9.
Online publication date: 26-Sep-2012.
Crossref
Tsuyoshi Ueno, Shin-ichi Maeda, Shin Ishii​‌. (2012) Asymptotic analysis of value prediction by well-specified and misspecified models. Neural Networks 31, 88-92.
Online publication date: 1-Jul-2012.
Crossref
, . (2012) Planning with Markov Decision Processes: An AI Perspective. Synthesis Lectures on Artificial Intelligence and Machine Learning 6:1, 1-210.
Online publication date: 3-Jul-2012.
Abstract | PDF (1748 KB) | PDF Plus (1730 KB) 
Muge Sayit, Orhan Sonmez​‌. (2012) Reinforcement learning for peer to peer video streaming applications. 2012 20th Signal Processing and Communications Applications Conference (SIU), 1-4.
Crossref
Orhan Sonmez, A. Taylan Cemgil​‌. (2012) Importance sampling for model-based reinforcement learning. 2012 20th Signal Processing and Communications Applications Conference (SIU), 1-4.
Crossref
YUESHENG HE, YUAN YAN TANG​‌. (2012) AUTONOMOUS BEHAVIORS OF GRAPHICAL AVATARS BASED ON MACHINE LEARNING. International Journal of Pattern Recognition and Artificial Intelligence 26:02, 1251002.
Online publication date: 28-Aug-2012.
Crossref
PIERRICK PLAMONDON, BRAHIM CHAIB-DRAA​‌. (2012) STOCHASTIC RESOURCE ALLOCATION IN MULTIAGENT ENVIRONMENTS: AN APPROACH BASED ON DISTRIBUTED Q-VALUES AND BOUNDED REAL-TIME DYNAMIC PROGRAMMING. International Journal on Artificial Intelligence Tools 21:01, 1250003.
Online publication date: 5-Apr-2012.
Crossref
Lucian Buşoniu, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Robert Babuška, Bart De Schutter​‌. 2012. Least-Squares Methods for Policy Iteration. Reinforcement Learning, 75-109.
Crossref
Hado van Hasselt​‌. 2012. Reinforcement Learning in Continuous State and Action Spaces. Reinforcement Learning, 207-251.
Crossref
Matthieu Geist, Bruno Scherrer​‌. 2012. ℓ1-Penalized Projected Bellman Residual. Recent Advances in Reinforcement Learning, 89-101.
Crossref
Markus Peters, Wolfgang Ketter, Maytal Saar-Tsechansky, John Collins​‌. 2012. Autonomous Data-Driven Decision-Making in Smart Electricity Markets. Machine Learning and Knowledge Discovery in Databases, 132-147.
Crossref
Riad Akrour, Marc Schoenauer, Michèle Sebag​‌. 2012. APRIL: Active Preference Learning-Based Reinforcement Learning. Machine Learning and Knowledge Discovery in Databases, 116-131.
Crossref
Nataliya Sokolovska​‌. 2012. Sparse Gradient-Based Direct Policy Search. Neural Information Processing, 212-221.
Crossref
Francisco Martinez-Gil, Miguel Lozano, Fernando Fernández​‌. 2012. Calibrating a Motion Model Based on Reinforcement Learning for Pedestrian Simulation. Motion in Games, 302-313.
Crossref
Yuxi Li, Dale Schuurmans​‌. 2012. MapReduce for Parallel Reinforcement Learning. Recent Advances in Reinforcement Learning, 309-320.
Crossref
Alexandros Papangelis, Nikolaos Kouroupas, Vangelis Karkaletsis, Fillia Makedon​‌. 2012. An Adaptive Dialogue System with Online Dialogue Policy Learning. Artificial Intelligence: Theories and Applications, 323-330.
Crossref
Evgenia Christoforou, Antonio Fernández Anta, Chryssis Georgiou, Miguel A. Mosteiro, Angel (Anxo) Sánchez​‌. 2012. Achieving Reliability in Master-Worker Computing via Evolutionary Dynamics. Euro-Par 2012 Parallel Processing, 451-463.
Crossref
Amir-massoud Farahmand, Csaba Szepesvári​‌. (2011) Model selection in reinforcement learning. Machine Learning 85:3, 299-332.
Online publication date: 11-Jun-2011.
Crossref
Esra Sisikoglu, Marina A. Epelman, Robert L. Smith​‌. (2011) A sampled fictitious play based learning algorithm for infinite horizon Markov Decision Processes. Proceedings of the 2011 Winter Simulation Conference (WSC), 4086-4097.
Crossref
Sean P. Meyn, Amit Surana​‌. (2011) TD-learning with exploration. IEEE Conference on Decision and Control and European Control Conference, 148-155.
Crossref
Warren B. Powell, Belgacem Bouzaiene-Ayari, Jean Berger, Abdeslem Boukhtouta, Abraham P. George​‌. (2011) The Effect of Robust Decisions on the Cost of Uncertainty in Military Airlift Operations. ACM Transactions on Modeling and Computer Simulation 22:1, 1-19.
Online publication date: 1-Dec-2011.
Crossref
2011. Bibliography. Approximate Dynamic Programming, 607-621.
Crossref
. (2011) Trading Agents. Synthesis Lectures on Artificial Intelligence and Machine Learning 5:3, 1-107.
Online publication date: 26-Jun-2011.
Abstract | PDF (1141 KB) | PDF Plus (1144 KB) 
 



Prev. lecture | Next lecture
View/Print PDF (3480 KB)
View PDF Plus (3481 KB)
Add to favorites
Email to a friend
TOC Alert | Citation Alert What is RSS?

Quick Search
for
Author:
Csaba Szepesvári
Keywords:
reinforcement learning
Markov Decision Processes
temporal difference learning
stochastic approximation
two-timescale stochastic approximation
Monte-Carlo methods
simulation optimization
function approximation
stochastic gradient methods
least-squares methods
overfitting
bias-variance tradeoff
online learning
active learning
planning
simulation
PAC-learning
$Q$-learning
actor-critic methods
policy gradient
natural gradient
Home | Synthesis | Search | Profile | Access | Author | Help | About
Technology Partner - Atypon Systems, Inc.