for all !



fees !

Bruno Scherrer, Researcher. [version française]

Member of the BIGS team of INRIA(1) (Institut National De Recherche en Informatique et Automatique).
Member of the Probability and Statistics team of IECL (Institut Elie Cartan de Lorraine), at université de Lorraine.

Research themes: Stochastic optimal control, reinforcement learning, Markov decision processes, approximate dynamic programming, analysis of algorithms, stochastic processes.

Author-pays journals (sometimes badly referred to as "open access" journals) : Why I refuse to review articles.

Fishy editors: Beware of VDM Publishing (and its publishing houses Editions Universitaires Européennes, Lap Lambert Academic Publishing, etc...); they accept to publish anything, like this math book parody randomly generated by mathgen (access to the nonsense content here).

E-mail:, Phone: +33 (0)3 72 74 54 04, Office: 219
Mailbox 1: Centre de recherche Inria Nancy - Grand Est, 615 rue du Jardin Botanique, 54600 Villers-lès-Nancy, FRANCE.
Mailbox 2: IECL, Université de Lorraine, Site de Nancy, B.P. 70239, F-54506 Vandœuvre-lès-Nancy Cedex, FRANCE.

(1) If it smells perspiration in the corridors, it is probably normal.

Selected works (a complete liste is available through the hal server)

On algorithms for Markov decision processes / zero-sum games

M. Geist, B. Scherrer, O. Pietquin. A Theory of Regularized Markov Decision Processes. To appear in ICML 2019.

Y. Efroni, G. Dalal, B. Scherrer, S. Mannor. How to Combine Tree-Search Methods in Reinforcement Learning. AAAI 2019. Outstanding paper award.

Y. Efroni, G. Dalal, B. Scherrer, S. Mannor. Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning. NIPS 2018.

Y. Efroni, G. Dalal, B. Scherrer, S. Mannor. Beyond the One Step Greedy Approach in Reinforcement Learning. ICML 2018.

Invited talk at AWRL 2017, November 15th, 2017. Two Simple Tricks for Improving the Solution to Large RL Problems.

Invited talk at EWRL 2016, December 3rd, 2016. On Periodic MDPs.

J. Pérolat, B. Piot, M. Geist, B. Scherrer, O. Pietquin. Softened Approximate Policy Iteration for Markov Games. ICML 2016.

J. Pérolat, B. Piot, B. Scherrer, O. Pietquin. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games. AISTATS 2016.

B. Scherrer. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration. Mathematics of Operations Research, 2016.
--- A short version appeared in NIPS 2013.

Lectures given at CIMI in Toulouse, as part of the Machine Learning Trimester: "Introduction to Reinforcement Learning" slides, code.

J. Perolat, B. Scherrer, B. Piot and O. Pietquin. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games. ICML 2015.

B. Lesner and B. Scherrer. Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies.
--- A short version appeared in ICML 2015.

M. Tagorti and B. Scherrer. Rate of Convergence and Error Bounds for LSTD(λ).
--- A short version appeared in ICML 2015.

B. Scherrer and M. Geist. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search. ECML 2014.

B. Scherrer. Approximate Policy Iteration Schemes: A Comparison. ICML 2014. code.

B. Scherrer, M. Ghavamzadeh, V. Gabillon, B. Lesner and M. Geist. Approximate Modified Policy Iteration and its Application to the Game of Tetris. Journal of Machine Learning Research, 2015.
--- A short version appeared in ICML 2012.
--- Some of the empirical results appeared in NIPS 2013.

M. Geist and B. Scherrer. Off-policy Learning with Eligibility Traces: A Survey. Journal of Machine Learning Research, 2014.

B. Scherrer. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris. Journal of Machine Learning Research, 2013.
--- A preliminary version of this article with a slightly different presentation is available in this technical report of 2007.

B. Scherrer and B. Lesner. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Discounted Markov Decision Processes. NIPS 2012.

M. Geist, B. Scherrer, A. Lazaric and M. Ghavamzadeh. A Dantzig Selector for Temporal Difference Learning. ICML 2012.

V. Gabillon, A. Lazaric, M. Ghavamzadeh and B. Scherrer. Classification-based Policy Iteration with a Critic. ICML 2011.

B. Scherrer. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view. ICML 2010.

C. Thiéry and B. Scherrer. Least-Squares Lambda Policy Iteration: Bias-Variance Trade-off in Control Problems. ICML 2010.
--- Related tech report: Performance bound for Approximate Optimistic Policy Iteration.

M. Petrik and B. Scherrer. Biasing Approximate Dynamic Programming with a Lower Discount Factor. NIPS 2008.

On Tetris:

V. Gabillon, M. Ghavamzadeh, B. Scherrer. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris. NIPS 2013.

C. Thiéry and B. Scherrer. Building Controllers for Tetris. International Computer Games Association Journal, 2009.

C. Thiéry and B. Scherrer. Improvements on Learning Tetris with Cross Entropy. International Computer Games Association Journal, 2009.

Related source code, MdpTetris: web page on INRIA Gforge, direct download, documentation


A. Boumaza and B. Scherrer. Convergence and Rate of Convergence of a Foraging Ant Model. CEC 2007.
--- An extended version.

A. Boumaza and B. Scherrer. Optimal control subsumes harmonic control. ICRA 2007.
--- An extended version.

B. Scherrer. Asynchronous Neurocomputing for optimal control and reinforcement learning with large state spaces. Neurocomputing, 2005.