CS5733 Program Synthesis

CS5733 Program Synthesis

Overview

Program Synthesis, the problem of automatically generating programs from user requirements, is a classic problem in Computer Science. These requirements can take several forms like formal logical formulas, input-output examples, and more recently, using higher-level inputs like images and natural languages. The motivation for synthesis stems from the fact that it is often easier for the user to convey the intent of the program compared to writing the complete program. Additionally, synthesis can also help in generating correct-by-construction programs, this can be particularly fruitful for tasks that are error-prone and/or safety-critical systems. Program synthesis sits at the intersection of programming languages, automated reasoning, and machine learning. This interdisciplinary nature of the field has sparked significant interest in recent years, with exciting developments in each of these areas.

This course will offer a broad study of the theory and practices of program synthesis. We will cover different classes of program synthesis techniques, like inductive synthesis vs. deductive synthesis, top-down vs. bottom-up, etc., and introduce underlying foundations for each class. The course will also cover major recent developments in the field, like type-driven synthesis, synthesizing domain-specific programs, and neuro-symbolic syntheses combining ML and PL techniques for program synthesis.

Pre-requisite(s): Open to CSE BTech 3rd year, 4th year, MTech and PhD students. No formal pre-requisite, expected background: Discrete Maths, Software Development Fundamentals and Data Structures.

Others please drop me an email about your interest and your department and course.

Classroom and Timings
Schedule
Textbook and Resources
  • Instructor-provided list of research papers.
  • Program Synthesis, Foundations and Trends® in Programming Languages. S. Gulwani, O. Polozov and R. Singh. ISBN: 978-1-68083-292-1
  • Types and Programming Languages (MIT Press) First Edition, Benjamin C. Pierce. ISBN: 9780262162098
  • Neuro-symbolic Programming, Foundations and Trends® in Programming Languages. Swarat Chaudhuri, Kevin Ellis, Oleksandr Polozov, Rishabh Singh, Armando Solar- Lezama and Yisong Yue.
Evaluation
Item Nos. Weight (%)
Paper Reviews 8 35
Class Attendance _ 5
Course Project 1 30
Mid Term Quiz 1 10
Final Exam 1 20

 

Reading List and Paper Reviews
Review Content

A review should start with a short, neutral summary of the paper (1-2 paragraphs), followed by a list of paper’s main contributions/insights, and a list of its limitations. Finally, answer specific questions posted for this paper on the Google Classroom about the paper List (if any). Our goal is to get you to think deeply and critically about the paper, not to test you. You won’t be graded on the correctness of your reviews; any non-trivial review will get you 5%.

Reading List
  • Week 2:
    • EUSolver Rajeev Alur, Arjun Radhakrishna, Abhishek Udupa: Scaling Enumerative Program Synthesis via Divide and Conquer. TACAS’17

    • Questions:
      • What does EUSolver use as behavioral constraints? Structural constraint? Search strategy?
      • What are the main two pruning/decomposition techniques EUSolver uses to speed up the search? What enables these techniques?
      • What would a naive alternative to decision tree learning be for synthesizing branch conditions? What are the disadvantages of this alternative?
    • Submit Responses Here
  • Week 3:
    • Euphony Woosuk Lee, Kihong Heo, Rajeev Alur, Mayur Naik: Accelerating Search-Based Program Synthesis usingLearned Probabilistic Models. PLDI’18
    • Questions:
      • What does Euphony use as behavioral constraints? Structural constraint? Search strategy? How are they different from EUSolver?
      • Consider Fig 2b, where the synthesizer is unrolling the sentential form Rep(x,”-“,S). When the search is guided by a PHOG, it considers the weighted productions shown in Fig 2a (top). What would these productions look like if we replaced the PHOG with a PCFG? With 3-grams? Do you think these other probabilistic models would work as well as a PHOG?
      • Consider Example 3.2. Explain in English the intuitive meaning of h(S) = 0.1 and why it is the case.
      • Consider Theorem 3.7. Give an example of sentential forms $n_i$ , $n_j$ and set of points pts such that n_i and n_j are equivalent on pts but not weakly equivalent.
    • Submit Responses Here
  • Week 4:
    • A partial reading break.
  • Week 5:
    • Yu Feng, Ruben Martins, Osbert Bastani, and Isil Dillig. 2018. Program synthesis using conflict-driven learning.
    • Questions
      • What does Neo use as behavioral constraints? Structural constraint? Search strategy?
      • Consider the component specifications given in Table1, What family of logic and theory (if any) do these specifications belong? Are the specifications for the partial programs, i.e. SAT encoding of Programs (as done in Definition 4.1) from the same family?
      • Consider Table1 again, add a specification reverse function for the list as well. Further show, adding these, what will be the new specification for the partial program generated by replacing head in Figure 3 with reverse.
      • Analyze conflict on the new partial program given in the previous question. Find the MUC if there is a conflict.
    • Submit Responses Here
  • Week 6:
    • Sumit Gulwani, Susmit Jha, Ashish Tiwari, Ramarathnam Venkatesan: Synthesis of loop-free programs. PLDI’11

    • Questions:
      • What does Brahma use as behavioral constraints? Structural constraints? Search strategy?
      • Brahma’s SMT encoding has an integer variable per component, which denotes the line where the component is to be placed. An alternative encoding is to have an integer variable per line, which denotes which component is to be placed into that line. Compare these two alternatives in terms of the size of the search space and the amount of domain knowledge they require. When should I prefer one over the other?
      • Consider the following modification of Brahma’s synthesis problem: you are given N components, but your program is only allowed to use any K of them (for a fixed K <= N). How would you modify the SMT encoding in section 5 to enforce this restriction?
      • Does the CEGIS loop in Fig 2 still work if the component specifications in $\phi_{lib}$ are imprecise? E.g. you can assume that one of the components has a trivial specification true.
    • Submit Responses Here: TBA
  • Week 8
    • Nadia Polikarpova, Ivan Kuraj, Armando Solar-Lezama: Program synthesis from Polymorphic Refinement Types. PLDI’16.

    • Questions:
      • What does Synquid use as behavioral constraints? Structural constraint? Search strategy?
      • Can Synquid’s Round-Trip Type Checking (try to understand how this is bi-directional typing) discard the following incomplete terms? Explain how or why not.
        • inc ?? :: {Int $\mid$ ν = 5}, where inc :: x:Int -> {Int $\mid$ ν = x + 1}
        • duplicate ?? :: { List Int $\mid$ len ν = 5}, where duplicate :: xs:List a -> {List a $\mid$ len ν = 2*(len xs)}
        • nats ?? :: List Pos, where nats :: n:Nat -> {List Nat $\mid$ len ν = n}, Nat = {Int $\mid$ ν >= 0}, Pos = {Int $\mid$ ν > 0}
      • Compare Synquid’s condition abduction mechanism to the one from EUSolver.
      • The Liquid Abduction (Branch Abduction) uses liquid formulas in place of any FOL formula in branch conditions, why was this made. Suggest one downside in terms of expressiveness of the programs Synquid can generate, giving example of a program it cannot generate.
    • Submit Responses Here
  • Week 9
    • Nadia Polikarpova, Ivan Kuraj, Armando Solar-Lezama: Program synthesis from Polymorphic Refinement Types. PLDI’16.

    • Questions:
      • What does Synquid use as behavioral constraints? Structural constraint? Search strategy?
      • Can Synquid’s Round-Trip Type Checking (try to understand how this is bi-directional typing) discard the following incomplete terms? Explain how or why not.
        • inc ?? :: {Int $\mid$ ν = 5}, where inc :: x:Int -> {Int $\mid$ ν = x + 1}
        • duplicate ?? :: { List Int $\mid$ len ν = 5}, where duplicate :: xs:List a -> {List a $\mid$ len ν = 2*(len xs)}
        • nats ?? :: List Pos, where nats :: n:Nat -> {List Nat $\mid$ len ν = n}, Nat = {Int $\mid$ ν >= 0}, Pos = {Int $\mid$ ν > 0}
      • Compare Synquid’s condition abduction mechanism to the one from EUSolver.
      • The Liquid Abduction (Branch Abduction) uses liquid formulas in place of any FOL formula in branch conditions, why was this made. Suggest one downside in terms of expressiveness of the programs Synquid can generate, giving example of a program it cannot generate.
    • Submit Responses Here
  • Week 10
  • Week 11

 

Course Project
Project ideas
Project Milestones and Evaluation Total:30 marks
  • M1, 20% (6 marks) : Discussion about your idea with me, what papers is your idea based on, what do you plan to do.
    • It will allow us to gauge the proposal in terms of if it is doable in the given time.
    • I can suggest some related papers and open-source implementations.
    • We can revise the problem, making it more palpable for the class project.
  • M2, 20% (6 marks): Formal 1-2 page proposal, this should include:
    • A concrete example, showing input and output for the synthesizer and how does the Algorithm roughly works on the input.
  • E1 : Execution of the Idea!
  • New: M2.1 : Execution Progress Meeting 10% (3 marks)
  • M3 : Final 50% (15 marks)
    • Presentation and 30% (9 marks)
    • Report (3-8 pages) in PACML PL format. 20% (6 marks)

Disclaimer All information on this class website is tentative and subject to change. This course is inspired significantly from other Program Synthesis classes at MIT, UCSD, and UW Madison. Please, check them as well.