Bruno Scherrer, Researcher. [version française]
Member of the BIGS team of INRIA (Institut National De Recherche en Informatique et Automatique). |

**Research themes:** Stochastic optimal control, reinforcement learning, Markov decision processes, approximate dynamic programming, analysis of algorithms, stochastic processes.

**Author-pays journals (sometimes badly referred to as "open access" journals)** : Why I refuse to review articles.

**Fishy editors**: Beware of VDM Publishing (and its publishing houses Editions Universitaires Européennes, Lap Lambert Academic Publishing, etc...); they accept to publish anything, like this math book parody randomly generated by mathgen (access to the nonsense content here).

**E-mail**: `Firstname.Name@inria.fr`, **Phone**: +33 (0)3 72 74 54 04, **Office**: 219
**Mailbox 1**: Centre de recherche Inria Nancy - Grand Est, 615 rue du Jardin Botanique, 54600 Villers-lès-Nancy, FRANCE.
**Mailbox 2**: IECL, Université de Lorraine, Site de Nancy, B.P. 70239, F-54506 Vandœuvre-lès-Nancy Cedex, FRANCE.

Presentation at EWRL 2016, December 3rd, 2016. On Periodic MDPs

Lectures given at CIMI in Toulouse, as part of the Machine Learning Trimester: "Introduction to Reinforcement Learning" slides, code

J. Pérolat, B. Piot, M. Geist, B. Scherrer, O. Pietquin.
Softened Approximate Policy Iteration for Markov Games. * ICML 2016.*

J. Pérolat, B. Piot, B. Scherrer, O. Pietquin. On the Use of Non-Stationary Strategies for Solving Two-Player Zero-Sum Markov Games. * AISTATS 2016.*

B. Scherrer. Improved and Generalized Upper Bounds on the Complexity of Policy Iteration. *Mathematics of Operations Research, 2016.*

--- A short version appeared in *NIPS 2013.*

J. Perolat, B. Scherrer, B. Piot and O. Pietquin. Approximate Dynamic Programming for Two-Player Zero-Sum Markov Games. *ICML 2015*.

B. Lesner and B. Scherrer. Tight Performance Bounds for Approximate Modified Policy Iteration with Non-Stationary Policies.

--- A short version appeared in *ICML 2015*.

M. Tagorti and B. Scherrer. Rate of Convergence and Error Bounds for LSTD(λ).

--- A short version appeared in *ICML 2015*.

B. Scherrer and M. Geist. Local Policy Search in a Convex Space and Conservative Policy Iteration as Boosted Policy Search. *ECML 2014.*

B. Scherrer. Approximate Policy Iteration Schemes: A Comparison. *ICML 2014.*

B. Scherrer, M. Ghavamzadeh, V. Gabillon, B. Lesner and M. Geist. Approximate Modified Policy Iteration and its Application to the Game of Tetris. *Journal of Machine Learning Research, 2015.*

--- A short version appeared in *ICML 2012.*

--- Some of the empirical results appeared in *NIPS 2013.*

M. Geist and B. Scherrer. Off-policy Learning with Eligibility Traces: A Survey. *Journal of Machine Learning Research, 2014.*

B. Scherrer. Performance Bounds for Lambda Policy Iteration and Application to the Game of Tetris. *Journal of Machine Learning Research, 2013.*

B. Scherrer and B. Lesner. On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Discounted Markov Decision Processes. *NIPS 2012.*

M. Geist, B. Scherrer, A. Lazaric and M. Ghavamzadeh. A Dantzig Selector for Temporal Difference Learning. *ICML 2012.*

V. Gabillon, A. Lazaric, M. Ghavamzadeh and B. Scherrer. Classification-based Policy Iteration with a Critic. *ICML 2011.*

B. Scherrer. Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view. *ICML 2010.*

C. Thiéry and B. Scherrer. Least-Squares Lambda Policy Iteration: Bias-Variance Trade-off in Control Problems. *ICML 2010.*

--- Related tech report: Performance bound for Approximate Optimistic Policy Iteration.

M. Petrik and B. Scherrer. Biasing Approximate Dynamic Programming with a Lower Discount Factor. *NIPS 2008.*

V. Gabillon, M. Ghavamzadeh, B. Scherrer. Approximate Dynamic Programming Finally Performs Well in the Game of Tetris. *NIPS 2013.*

C. Thiéry and B. Scherrer. Building Controllers for Tetris. *International Computer Games Association Journal, 2009.*

C. Thiéry and B. Scherrer. Improvements on Learning Tetris with Cross Entropy. *International Computer Games Association Journal, 2009.*

A. Boumaza and B. Scherrer. Convergence and Rate of Convergence of a Foraging Ant Model. *CEC 2007.*

--- An extended version.

A. Boumaza and B. Scherrer. Optimal control subsumes harmonic control. *ICRA 2007*.

--- An extended version.

B. Scherrer. Asynchronous Neurocomputing for optimal control and reinforcement learning with large state spaces. *Neurocomputing, 2005.*