Probabilistic Programming Seminar 2023-2024

Description

Probabilistic programming languages (PPLs) use the syntax and semantics of programming languages to define probabilistic models. Using computer programs as a representation of probabilistic models yields two benefits. First, any computable process can be modelled as a probabilistic program. Second, PPLs enable a diverse audience – data scientists, systems designers, medical doctors, etc. – to design and reason about probabilistic systems. PPLs are becoming one of the central topics in probabilistic reasoning and programming languages, with increasing interest from industry and academia.

This course will:

introduce the core ideas of probabilistic programming – probabilistic inference, language design, and applications – as well as state-of-the-art ideas in the field that make PPLs more reliable, usable, and faster.
teach you how to critically analyse state-of-the-art ideas, their strengths and weaknesses, and propose improvements.

At the end of the course, you will be able to:

read and write probabilistic programs
understand how probabilistic programming languages are implemented and how they can be extended
understand the basic and state-of-the-art probabilistic inference procedures (which are applicable beyond probabilistic programs)

Prerequisites

The target audience for this course are Master’s and PhD students. The are no formal prerequisites, but the students are expected to be comfortable with programming and mathematical notation. Basic familiarity with probability theory, artificial intelligence and machine learning is expected (equivalent to a Bachelor’s level course). We will not revise these topics during the lectures, but refresher materials are listed below.

Course Format

The course consists of several learning activities, alongside lectures by the intrstructor, with the points distributed as follows:

0%: Paper reviews
25%: Presentation
65%: Research report
10%: Participation

### Course deliverables

Paper reviews. The course will take the format of a research seminar. This means that there will be no textbook and blackboard lectures; instead, we will be reading and discussing state of the art papers in the field. Consequently, you are expected to come prepared for the class by reading one of the papers covered in the lecture. As a preparation for the class, you have to write a review of that paper. The reviews consist of a few questions that help you understand the paper to sufficient extend; they will point out important points, what you should pay attention to, and which part you can avoid. Scientific articles are not always easy to read as they are typically written for people that already know a lot about the field; the review questions help you navigate that. The reviews are not graded, but are essential preparation for the classes. After the lectures, you will receive prototypical answers that help you to track your progress. These questions are in the Miro boards associated with the paper, and you should submit them in Brightspace.

Paper presentation. Only the classes in the first two weeks will be typical lectures; the goal of these lectures is to set all of you at the same starting point. Starting in week 3, you will take the lead role.

In every lecture, we will cover 2 papers on arelated topic. Each paper will be presented by a student taking the course. You choose which paper you present. for every lecture, you are expect to read only one paper out of the 2 we cover; you will learn about the other one from the presentation of your classmate.

For the paper you are responsible, you will understand the paper in details, including searching for additional literature that helps you to understand the paper and why it works, and present it to your colleagues. You are allowed to use any material from the Web for your presentations.

Your objective for the presentation: explain the core idea behind the assigned paper in clearest terms possible. Don’t try to cover everything in the paper, identify important and essential parts. Convey the intuitions before the math.

Lastly, schedule a meeting with me at least 2 days before your presentation for the feedback.

Research report. The largest part of your grade and efforts goes to a research report. In contrast to standard courses, the goal of this course is not for you to just soak up what you have been thought; instead, the goal is to go beyond the materials and think about strengths and weaknesses, and how to overcome them.

You main task in the report is to design a research proposal, without execuyting it. You will have to dig deeper into your topic, studying papers that we will not cover in the class. The report should consist of 5 parts:

Description of the topic. Briefly describe your topic and contextualise it within the probabilistic programming field. Describe what the problem is and why is it challenging.
Literature study. Starting from a paper you have been assigned to, you should find several related papers (which either propose alternative solutions to the same problem, or use similar techniques for other problems). You should briefly summarise them and outline their strengths and weaknesses compared to the main paper you started with.
Relation to other topics in the course. You cannot pass the course by only investing time into your own topic. In the report, you have to describe how it connects to every other topic we have covered (does topic X solve the same problem as your topic? Does X employ different assumptions? Does it use different techniques? …).
Experimental test. Each paper we cover either comes with code implementing it or is easy to implement on top of existing probabilistic programming languages. You are expected to play with your method and stress-test it. You can, for example, design probabilistic programs that test the complexity of problems the inference procedures can handle, you can change parts of the techniques to see if they improve their performance or make it worse, etc. If you are re-implementing papers, you don’t have to implement them fully. Start from a simplified versions, explain your motivation for the simplification, and add complexities later on (given the time restraints).
Research proposal(s). Describe one or more research projects you think would advance the field, focusing on your topic. The proposal should
- Clearly describe the problem you are addressing.
- How would you address it technically, and why does that approach make sense
- How would you experimentally test it
- what could go wring with your plan. Note that I do understand that, for many of you, this is the first time encouring open-ended research project. I do not expect you to design entirely new projects no one has taught of before. While you shoudl certainty try to do that, research papers often contain ‘Future work’ sections; you are allowed to start from these suggestions and work them out in details.

Starting in week 5, you will have some time avaialble to request my feedback on your drafts. I will reserve a certain amount of time per week to read your drafts. Each of you will also get a fix amount of time, which you can divide as you wish (e.g., all at once, or distribute it through the weeks). The details will be known once I know how many people will be taking this course.

Class participation. As this is a seminar course, you are expected to actively participate in class. During the lectures, we will be clarifying the details from the papers, analysing their strengths, weaknesses, and evaluation; in other words, exactly what you are supposed to do in your reports. Your participation will be graded in two ways:

Sending questions in advance. To start up discussion, I ask you to think about the papers in advance and post questions you have about it. What is a good question? Anything that helps you to understand the paper and is not of the form “What is X?” where X is something explained in the paper. Put these question in the Miro board and don’t forget to put your name next to the question.
Helping others. Our shared goal is to understand the papers, and the field of probabilistic programming itself, together. We will maintain a shared PDF in which you can post and answer questions about papers. Answer the question in the Miro board and, again, dont’ forget to put your name.

See how to do well in a seminar course.

Schedule

You do not have to read every paper from the required literature. You will be divided in groups (group split is on Brightspace) and each group reads only one paper for the class. You will learn about the other paper from the presentation of your colleagues.

Use review questions to focus on the important parts. Of course, you are not discouraged to go into depths of every paper.

Date	Topic
September 4, 2023 (W1 L1)	What is probabilistic programming? What is model-based reasoning? The anatomy of a probabilistic program. Course structure. [Slides]
	Chapters 3 and 4 (without 4.4) from Automating Inference, Learning, and Design using Probabilistic Programming Tom Rainforth

September 7, 2023 (W1 L2)	Generative thinking. How to write probabilistic programs? What is the distribution probabilistic program captures? [Slides]

September 11, 2023 (W2 L1)	Basic inference procedures: Enumeration, Rejection sampling, Importance Sampling, Metropolis-Hastings MCMC, Sequential Monte Carlo (Particle filtering). Why do they work? [Slides]
	Paper 1 Chapter 8 from Probabilistic models of cognition Noah D. Goodman, Joshua B. Tenenbaum Paper 2 Sections 4.1-4.3 from An introduction to probabilistic programming Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, Frank Wood

September 14, 2023 (W2 L2)	Implementation strategies. Database view. Continuations. Message passing.
	Paper 1 Lightweight Implementations of Probabilistic Programming Languages Via Transformational Compilation David Wingate, Andreas Stuhlmüller, Noah D. Goodman [Miro board] Paper 2 C3: Lightweight Incrementalized MCMC for Probabilistic Programs using Continuations and Callsite Caching Daniel Ritchie, Andreas Stuhlmuller, Noah D. Goodman [Miro board] Paper 3 Sections 6.1, 6.4-6.7 from An introduction to probabilistic programming Jan-Willem van de Meent, Brooks Paige, Hongseok Yang, Frank Wood [Miro board]

September 18, 2023 (W3 L1)	Gradient-directed probabilistic inference
	Paper 1 MCMC using Hamiltonian dynamics (first 20 pages) Radford M. Neal [Slides] [Miro board] Code starter: HMC implementation in Gen.jl or implement it from scratch
	Paper 2 Automated Variational Inference in Probabilistic Programming David Wingate, Theo Weber [Slides] [Miro board] Code starter: Implementation in Gen.jl or Pyro, or implemented a macro in Gen.jl to transform an arbitrary program into a variational one

September 21, 2023 (W3 L2)	Learning for inference
	Paper 1 Deep Amortized Inference for Probabilistic Programs Daniel Ritchie, Paul Horsfall, Noah D. Goodman [Miro board] Code starter: start from Gen.jl and implement a machine learning part
	Paper 2 Inference Compilation and Universal Probabilistic Programming Tuan Anh Le, Atılım Güneş Baydin, Frank Wood [Miro board] Code starter: implement a piple over Gen.jl or start from this code

September 24 2023 (W4 L1)	Programs with stochastic support
	Paper 1 Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support Yuan Zhou, Hongseok Yang, Yee Whye Teh, Tom Rainforth [Slides] [Miro board] Code starter: Gen.jl provides you with everything you need to implement a simplified version of this. You are allowed to collaborate wiht the colleague from the same session
	Paper 2 Rethinking Variational Inference for Probabilistic Programs with Stochastic Support Tim Reichelt, Luke Ong, Tom Rainforth [Slides] [Miro board] Code starter: Gen.jl provides you with everything you need to implement a simplified version of this. You are allowed to collaborate wiht the colleague from the same session

September 28 2023 (W4 L2)	Programmable inference
	Paper 1 Gen: A General-Purpose Probabilistic Programming System with Programmable Inference Marco F. Cusumano-Towner, Feras A. Saad, Alexander K. Lew, Vikash K. Mansinghka [Slides] [Miro board] Code starter: Gen.jl
	Paper 2 SMCP3: Sequential Monte Carlo with probabilistic program proposals Alexander K Lew, George Matheos, Tan Zhi-Xuan, Matin Ghavamizadeh, Nishad Gothoskar, Stuart Russell, Vikash K Mansinghka [Miro board] *Code starter: Gen.jl library implementing the functionality in Gen.jl, or build a simplified version from scratch

October 2, 2023 (W5 L1)	Connection between probabilistic and logical reasoning
	Paper 1 On probabilistic inference by weighted model counting Mark Chavira, Adnan Darwiche [Miro board] Code starter: you can play with BDD and SDD (a better version of BDDs) though the Python library PySDD or Julia collection of computation circuits in Juice.jl; there is also a BDD library in Julia; Problog also offers a connection to BDDs/SDDs
	Paper 2 Scaling Exact Inference for Discrete Probabilistic Programs Steven Holtzen, Guy van den Broeck, Todd Millsten [Miro board] Code starter: Dice repository

October 5, 2023 (W5 L2)	Probabilistic logic programming
	Paper 1 ProbLog: A Probabilistic Prolog and its Application in Link Discovery Luc De Raedt, Angelika Kimmig, Hannu Toivonen [Slides] [Miro board] Code starter: Problog website
	Paper 2 k-Optimal: A novel approximate inference algorithm for Problog Joris Renkens, Guy Van den Broeck, Siegfried Nijssen [Miro board]

October 9, 2023 (W6 L1)	Deep probabilistic programming
	Paper 1 DeepProbLog: Neural Probabilistic Logic Programming Robin Manhaeve, Sebastijan Dumančić, Angelika Kimmig, Thomas Demeester, Luc De Raedt [Miro board]
	Paper 2 DeepStochLog: Neural Stochastic Logic Programming Thomas Winters, Giuseppe Marra, Robin Manhaeve, Luc De Raedt [Miro board]

October 12, 2023 (W6 L2)	Incremental and anytime inference
	Paper 1 Incremental inference for probabilistic programs Marco Cusumano-Towner, Benjamin Bichsel, Timon Gehr, Martin Vechev, Vikash K. Mansinghka [Slides] [Miro board] Code starter: Gen implementation of trace translators
	Paper 2 Anytime Inference in Probabilistic Logic Programs with TP-Compilation Jonas Vlasselaer, Guy Van den Broeck, Angelika Kimmig, Wannes Meert, Luc De Raedt [Miro board] Code starter: anytime inference is in Problog: link

October 16, 2023 (W7 L1)	Deep generative models
	Paper 1 Variational Inference with Normalizing Flows Danilo Jimenez Rezende, Shakir Mohamed [Miro board] Code starter: any implementation of normalising flows like normalizing flows in Pytorch, FlowTorch, or Pyro
	Paper 2 Denoising Diffusion Probabilistic Models Jonathan Ho, Ajay Jain, Pieter Abbeel [Miro board] Code starter: any implementation such as this one

October 19, 2023 (W7 L2)	Generalised paradigms for probabilistic programming
	Paper 1 Algebraic Model Counting Angelika Kimmig, Guy Van den Broeck, Luc De Raedt [Slides] [Miro board] Code starter: Problog allows you to play with various semirings
	Paper 2 Automating Involutive MCMC using Probabilistic and Differentiable Programming Marco Cusumano-Towner, Alexander K. Lew, Vikash K. Mansinghka [Miro board] Code starter: Gen.jl’s implementation of Involutive MCMC

October 23, 2023 (W8 L1)	No probability? No problem! Alternative sources of probabilities.
	Paper 1 Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems Tina Toni, David Welch, Natalja Strelkowa, Andreas Ipsen and Michael P.H Stumpf [Miro board] Code starter: SBI library or implement it from scratch in Gen.jl
	Paper 2 Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model Atılım Günes Baydin et al [Miro board] Code starter: take a much simpler simulators and work with it

October 26, 2023 (W8 L1)	Learning probabilistic programs
	Paper 1 Data-Driven Synthesis of Full Probabilistic Programs Sarah Chasins, Phitchaya Mangpo Phothilimthana [Miro board] Code starter: I recommend a fresh implementation in Gen.jl, but the original code repository might have useful information
	Paper 2 Inferring Signaling Pathways with Probabilistic Programming David Merrell, Anthony Gitter [Miro board] Code starter: The code is provided in this tutorial
	Paper 3 3DP3: 3D Scene Perception via Probabilistic Programming Nishad Gothoskar, Marco Cusumano-Towner, Ben Zinberg, Matin Ghavamizadeh, Falk Pollok, Austin Garrett, Joshua B. Tenenbaum, Dan Gutfreund, Vikash K. Mansinghka [Miro board] Code starter: The code is provided in this tutorial

Other materials

Additional materials

The course does not have a required textbook, it instead relies on selected papers. Some materials that might help you better understand probabilistic programming:

Probabilistic Models of Cognition: the book uses probabilistic programs to formulate models of human cognition. The book is a great introduction to using probabilistic programming and contains plenty of example to practice your modelling skills.
The Design and Implementation of Probabilistic Programming Languages: a high-level book covering the basic inference procedures.
An Introduction to Probabilistic Programming: a work-in-progress introductory textbook. Covers many topics of the course. Very good introduction to basic inference procedures, implementation strategies, and variational inference.
Statistical Relational Artificial Intelligence: Logic, Probability, and Computation: a high-level introduction to various paradigms of probabilistic logic programming.
Foundations of Probabilistic Logic Programming: a deeper introduction to probabilistic logic programming.
Types and programming languages: good introduction to foundations of programming languages.
Probabilistic Programming and Bayesian Methods for Hackers: an easy to follow book, great at explaining key ideas intuitively.

Additional topics

Probabilistic programming is rapidly developing field. There are many topics we will not be able to cover in the class. Here is a snapshot of such topics, if you want to know more:

Other inference tasks: The inference task we look at was the characterisation of the posterior distribution. What if we are interested in other tasks, e.g., maximising the posterior?
- Bayesian Optimization for Probabilistic Programs
- Expectation Programming: Adapting Probabilistic Programming Systems to Estimate Expectations Efficiently
Learning probabilsitic programs: a standard PP pipeline focuses on inference; but what if we don’t know the program?
Semantics of probabilistic programming languages: how to define the meaning of a probabilistic program?
Deep generative models: an alternative framework for generative modelling based on neural networks. It has less flexibility than PPL but better learning properties.
- Deep Generative Modelling
Analysis of probabilistic programs:
- Conditional independence by typing
- Probabilistic Termination: Soundness, Completeness, and Compositionality
Nested probabilistic programs: probabilistic programs that call probabilistic programs
Stochastic conditioning: in the course, the observations always came in form of the exact value of a variable. What if we want to condition on a distribution?
- Probabilistic Programs with Stochastic Conditioning
Deep probabilistic programming: we have touched upon various interaction between probabilistic programming and deep neural networks. Deep probabilistic programming is a paradigm that combines the two, often with murky boundaries from e.g. variational or amortised inference.
- Neural probabilistic logic programming
- Deep Probabilistic Programming

Courses at other universities

Probabilistic programming by Frank Wood at University of British Columbia, Canada
seminar course by Steven Holtzen at Northeastern, USA
seminar course by Dan Roy at University of Toronto, Canada, which focuses on theoretical aspects of probabilistic programming
Deep Probabilistic Programming course by Thomas Hamelryck and Ahmad Salim Al-Sibahi at University of Copenhagen, Denmark
Probabilistic Programming Languages course by Norman Ramsey at Tufts University, USA
Probabilistic Programming and Relational Learning bu Guy Van den Broeck at UCLA, USA.
Probabilistic Programming course by Hongseok Yang at KAIST, South Korea

Practicalities

How to do well in a seminar course

A seminar course is different from a standard course in two ways: (1) “just” learning the topics covered by the materials is not enough; and (2) you are expected to actively participate in the lectures. The active participation includes giving a presentation (see Preparing your presentation) and discussing the materials during the lecture. Actually, the lectures will consist only of these two activities. The exceptions are week 1, which contain two lectures by the instructor, and week 2, in which we discuss papers but no one has to prepare the presentation. Each student will present one paper and is expected to actively contribute in discussions.

If understanding the materials is not enough, what is? In a seminar course, we teach you to think beyond what you have been served. In fact, if you only master the materials, that will get you a mere 6. Instead, you are expected to critically analyse every paper you read. For every paper you read, think about the following questions:

What is the main idea proposed in the paper?
Which problem does it solve?
What are its strengths?
What are its weaknesses?
Which parts of the paper were/and perhaps still are confusing to you?
Do you have an impression that the idea works just because the authors made some particular choices?
What are the implicit assumption the authors are making?
Do you see multiple options to solve a subproblem X and are wondering why the authors chose
Is the idea properly evaluated? If not, what is missing? How would you do it?
Are the arguments for the idea convincing?
How would you extend the work?

During the discussion part of the lectures, we will discuss these questions and compare your observation. Not all questions are applicable to every paper, so don’t worry if you cannot relate some of them to some papers. The quality of the discussion will therefore depend on you. Importantly, you have to come prepared for each lecture.

It is important that you actively participate in the discussion. As a rule of thumb, the more you talk, the higher your grade will be in the end. However, be constructive and avoid saying something just to say something. I don’t want you to be quiet the entire quarter, but that does not mean that you have to say something in every discussion. Sometimes you would have something brief to say, sometimes you would have some deeper insight. That is fine. As a rule of thumb, I expect you to say something at least every few lectures. Don’t be upset if I respond to you or even correct you. For most of you, this is the first time you are participating in a seminar. It is expected that you need some guidance. Moreover, I tend to jump in when you mention something interesting without (perhaps) realising it.

An important difference between a regular course and a seminar is that you will not be penalised for not understanding something. That does not mean that this is a free-pass course, but rather that it is ok to build up towards an understnading during the course. You will be reading research papers, often in depth.
In contrast to textbooks, research papers are concise and expect certain knowledge from a reader. This has two consequence. First, it is likely that you will not know some of the necessary concepts and would have to fill the gap to understand the paper. You should go through that process. Second, scientists are often not great writers – they might explain simple concepts very confusingly because they have different audience in mind (peers, not students). For any of these reasons, you might not understand a part of the paper. This is normal and you should not hide it – confusing parts of the paper are great starters for discussions!

Here are a few situations that are acceptable in a seminar course, but perhaps not in a regular course:

You misunderstand something in a paper and provide a wrong answer in a review. If you realise that during the lecture, you can correct you answer after the lecture. That is fine.
You misunderstand something that leads you to an interesting idea for an extension. This is perfectly fine if your reasoning taking the misunderstanding as a fact is correct and interesting.
You just can’t understand what the paper is trying to say. Voice it in the lecture! Try to phrase it as a detailed question; point out as specifically as possible where you lost it

Preparing your presentation

You will have to present one paper to the group. Student presentation start in week 3.

Your presentation should take 15 mins. This is an estimate, some papers might require less while other more time. If you think you have to go beyond this limit, discuss it with me.

Approach your presentation as a short lecture. You goal is to explain the topic to your colleagues so that they learn from the lecture; only half of the student in the class will have read the paper. Think about the right way to explain the topic; this might not be the way it was explained in the paper. For instance, if you found that you had to look a lot of other concepts, explain them before the topic you are covering. Use illustrations and animations, you can find lots of them online (if not for the exact topic, then for something very related that could help you explain the intuition). Provide a working example, especially if such an example is not present in the original paper.

Feel free to use “non-academic” materials for your preparations: blogs, videos, informal notes… I don’t care what you use as long as you understand the topic. Importantly, if you encounter materials that you found more useful than the one I proposed, share them with me.

Your main focus in the presentation should be to convey the idea/concept to your colleagues. This usually means that you have to think about not only how to convey the idea effectively but also which parts of the paper to include or skip. There will be no correlation between the amount of content squeezed in a lecture and a grade. Often it makes sense to skip something to optimise for clarity.

Important: schedule a meeting with me at least 2 days before your presentation; this is to ensure that your presentation is of sufficient quality to provide a valuable learning experience to your colleagues.

Project - implementation

Project - report

Course feedback

This is the first edition of the course. As you can imagine, this is kind of a beta version of the course. To make sure the course offers the best learning experience to you, I would appreciate if you provide feedback throughout the course. What works for you? What doesn’t? Do you have an idea how to improve something?

You can submit the feedback anonymously HERE.