CS5733 Program Synthesis

Overview

Program Synthesis, the problem of automatically generating programs from user requirements, is a classic problem in Computer Science. These requirements can take several forms like formal logical formulas, input-output examples, and more recently, using higher-level inputs like images and natural languages. The motivation for synthesis stems from the fact that it is often easier for the user to convey the intent of the program compared to writing the complete program. Additionally, synthesis can also help in generating correct-by-construction programs, this can be particularly fruitful for tasks that are error-prone and/or safety-critical systems. Program synthesis sits at the intersection of programming languages, automated reasoning, and machine learning. This interdisciplinary nature of the field has sparked significant interest in recent years, with exciting developments in each of these areas.

This course will offer a broad study of the theory and practices of program synthesis. We will cover different classes of program synthesis techniques, like inductive synthesis vs. deductive synthesis, top-down vs. bottom-up, etc., and introduce underlying foundations for each class. The course will also cover major recent developments in the field, like type-driven synthesis, synthesizing domain-specific programs, and neuro-symbolic syntheses combining ML and PL techniques for program synthesis.

Pre-requisite(s): Open to CSE BTech 3rd year, 4th year, MTech and PhD students. No formal pre-requisite, expected background: Discrete Maths, Software Development Fundamentals and Data Structures.

Others please drop me an email about your interest and your department and course.

Classroom and Timings

Instructor: Ashish Mishra
Where: C-LH5
Discussions Google Classroom
When: Slot R, Tuesdays 14:30 - 15:55, Fridays 16:00 - 17:25

Schedule

July 30
- Introduction and Overview
August 2
- SyGuS
August 6
- Enumeration and Space Pruning
August 9
- Other Search Techniques
August 12
- Propositional Logic and Normal Forms
- Tseitin’s Encoding
August 16
- SAT
August 23
- First Order Logic and Properties
August 27
- First Order Theories and SMT
August 30
- Representation-Based Learning I
September 3
- Representation-Based Learning II
September 10
- Stochastic Search
September 13
- Sketching Language Introduction
September 20
- Sketching and constraints based search
September 24, 27, October 4
- Type Driven Syntehsis
October 15, 18
- Hoare Logic, Synthesis for Imperative Programs
October 22
- Synthesis with Abstract Interpretation
October 25, 29
- Bayesian View of Synthesis + DreamCoder
November 4
- Neural and NeuroSymbolic Program Synthesis
November 8
- DeepLearning Introduction
November 12
- Neural and NeuroSymbolic Program Synthesis
November 22
- LLMs and Program Synthesis

Textbook and Resources

Instructor-provided list of research papers.
Program Synthesis, Foundations and Trends® in Programming Languages. S. Gulwani, O. Polozov and R. Singh. ISBN: 978-1-68083-292-1
Types and Programming Languages (MIT Press) First Edition, Benjamin C. Pierce. ISBN: 9780262162098
Neuro-symbolic Programming, Foundations and Trends® in Programming Languages. Swarat Chaudhuri, Kevin Ellis, Oleksandr Polozov, Rishabh Singh, Armando Solar- Lezama and Yisong Yue.

Evaluation

Item	Nos.	Weight (%)
Paper Reviews	8	35
Class Attendance	_	5
Course Project	1	30
Mid Term Quiz	1	10
Final Exam	1	20

Reading List and Paper Reviews

Review Content

A review should start with a short, neutral summary of the paper (1-2 paragraphs), followed by a list of paper’s main contributions/insights, and a list of its limitations. Finally, answer specific questions posted for this paper on the Google Classroom about the paper List (if any). Our goal is to get you to think deeply and critically about the paper, not to test you. You won’t be graded on the correctness of your reviews; any non-trivial review will get you 5%.

Reading List

Week 2:
- EUSolver Rajeev Alur, Arjun Radhakrishna, Abhishek Udupa: Scaling Enumerative Program Synthesis via Divide and Conquer. TACAS’17
- Questions:
  - What does EUSolver use as behavioral constraints? Structural constraint? Search strategy?
  - What are the main two pruning/decomposition techniques EUSolver uses to speed up the search? What enables these techniques?
  - What would a naive alternative to decision tree learning be for synthesizing branch conditions? What are the disadvantages of this alternative?
- Submit Responses Here
Week 3:
- Euphony Woosuk Lee, Kihong Heo, Rajeev Alur, Mayur Naik: Accelerating Search-Based Program Synthesis usingLearned Probabilistic Models. PLDI’18
- Questions:
  - What does Euphony use as behavioral constraints? Structural constraint? Search strategy? How are they different from EUSolver?
  - Consider Fig 2b, where the synthesizer is unrolling the sentential form Rep(x,”-“,S). When the search is guided by a PHOG, it considers the weighted productions shown in Fig 2a (top). What would these productions look like if we replaced the PHOG with a PCFG? With 3-grams? Do you think these other probabilistic models would work as well as a PHOG?
  - Consider Example 3.2. Explain in English the intuitive meaning of h(S) = 0.1 and why it is the case.
  - Consider Theorem 3.7. Give an example of sentential forms $n_i$ , $n_j$ and set of points pts such that n_i and n_j are equivalent on pts but not weakly equivalent.
- Submit Responses Here
Week 4:
- A partial reading break.
Week 5:
- Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning.
- Questions
  - What does Neo use as behavioral constraints? Structural constraint? Search strategy?
  - Consider the component specifications given in Table1, What family of logic and theory (if any) do these specifications belong? Are the specifications for the partial programs, i.e. SAT encoding of Programs (as done in Definition 4.1) from the same family?
  - Consider Table1 again, add a specification reverse function for the list as well. Further show, adding these, what will be the new specification for the partial program generated by replacing head in Figure 3 with reverse.
  - Analyze conflict on the new partial program given in the previous question. Find the MUC if there is a conflict.
- Submit Responses Here
Week 6:
- Sumit Gulwani, Susmit Jha, Ashish Tiwari, Ramarathnam Venkatesan: Synthesis of loop-free programs. PLDI’11
- Questions:
  - What does Brahma use as behavioral constraints? Structural constraints? Search strategy?
  - Brahma’s SMT encoding has an integer variable per component, which denotes the line where the component is to be placed. An alternative encoding is to have an integer variable per line, which denotes which component is to be placed into that line. Compare these two alternatives in terms of the size of the search space and the amount of domain knowledge they require. When should I prefer one over the other?
  - Consider the following modification of Brahma’s synthesis problem: you are given N components, but your program is only allowed to use any K of them (for a fixed K <= N). How would you modify the SMT encoding in section 5 to enforce this restriction?
  - Does the CEGIS loop in Fig 2 still work if the component specifications in $\phi_{lib}$ are imprecise? E.g. you can assume that one of the components has a trivial specification true.
- Submit Responses Here: TBA
Week 8
- Nadia Polikarpova, Ivan Kuraj, Armando Solar-Lezama: Program synthesis from Polymorphic Refinement Types. PLDI’16.
- Questions:
  - What does Synquid use as behavioral constraints? Structural constraint? Search strategy?
  - Can Synquid’s Round-Trip Type Checking (try to understand how this is bi-directional typing) discard the following incomplete terms? Explain how or why not.
    - inc ?? :: {Int $\mid$ ν = 5}, where inc :: x:Int -> {Int $\mid$ ν = x + 1}
    - duplicate ?? :: { List Int $\mid$ len ν = 5}, where duplicate :: xs:List a -> {List a $\mid$ len ν = 2*(len xs)}
    - nats ?? :: List Pos, where nats :: n:Nat -> {List Nat $\mid$ len ν = n}, Nat = {Int $\mid$ ν >= 0}, Pos = {Int $\mid$ ν > 0}
  - Compare Synquid’s condition abduction mechanism to the one from EUSolver.
  - The Liquid Abduction (Branch Abduction) uses liquid formulas in place of any FOL formula in branch conditions, why was this made. Suggest one downside in terms of expressiveness of the programs Synquid can generate, giving example of a program it cannot generate.
- Submit Responses Here
Week 9
- Nadia Polikarpova, Ivan Kuraj, Armando Solar-Lezama: Program synthesis from Polymorphic Refinement Types. PLDI’16.
- Questions:
  - What does Synquid use as behavioral constraints? Structural constraint? Search strategy?
  - Can Synquid’s Round-Trip Type Checking (try to understand how this is bi-directional typing) discard the following incomplete terms? Explain how or why not.
    - inc ?? :: {Int $\mid$ ν = 5}, where inc :: x:Int -> {Int $\mid$ ν = x + 1}
    - duplicate ?? :: { List Int $\mid$ len ν = 5}, where duplicate :: xs:List a -> {List a $\mid$ len ν = 2*(len xs)}
    - nats ?? :: List Pos, where nats :: n:Nat -> {List Nat $\mid$ len ν = n}, Nat = {Int $\mid$ ν >= 0}, Pos = {Int $\mid$ ν > 0}
  - Compare Synquid’s condition abduction mechanism to the one from EUSolver.
  - The Liquid Abduction (Branch Abduction) uses liquid formulas in place of any FOL formula in branch conditions, why was this made. Suggest one downside in terms of expressiveness of the programs Synquid can generate, giving example of a program it cannot generate.
- Submit Responses Here
Week 10
- Rishabh Singh, Armando Solar-Lezama, Synthesizing data structure manipulations from storyboards, 2011
Week 11
- Alex Graves et al.Neural Turing Machines

Course Project

Project ideas

A list of possible Ideas

Project Milestones and Evaluation Total:30 marks

M1, 20% (6 marks) : Discussion about your idea with me, what papers is your idea based on, what do you plan to do.
- It will allow us to gauge the proposal in terms of if it is doable in the given time.
- I can suggest some related papers and open-source implementations.
- We can revise the problem, making it more palpable for the class project.
M2, 20% (6 marks): Formal 1-2 page proposal, this should include:
- A concrete example, showing input and output for the synthesizer and how does the Algorithm roughly works on the input.
E1 : Execution of the Idea!
New: M2.1 : Execution Progress Meeting 10% (3 marks)
M3 : Final 50% (15 marks)
- Presentation and 30% (9 marks)
- Report (3-8 pages) in PACML PL format. 20% (6 marks)

Disclaimer All information on this class website is tentative and subject to change. This course is inspired significantly from other Program Synthesis classes at MIT, UCSD, and UW Madison. Please, check them as well.