VIJAY GANESH RESEARCH STATEMENT

The primary focus of my research is the theory and practice of mathematical logic reasoning algorithms (e.g., SAT/SMT solvers or provers) aimed at software engineering, security, and mathematics. In this context, I currently lead the following projects: MapleSAT solver, complexity-theoretic understanding of solvers, theory and practice of string solvers, MathCheck SAT+CAS system for combinatorial mathematics, Attack Resistance, SAT-based cryptanalysis, smart contract verification/testing, and testing/verification of physics software. To-date I have won over 25 honors, best-paper awards, and distinctions in my research career.

From 2005 to 2012, I led the STP project, a solver for a theory of bit-vectors and arrays, that played an important role in the development of dynamic symbolic execution or concolic testing.

Motivation

Mathematical logic reasoning engines (aka solvers or provers) are algorithms that take as input mathematical formulas and decide whether they have solutions (dually, decide validity). Over last several decades, these algorithms have had deep impacts in many areas of software engineering, security, physics, and mathematics. For example, SAT/SMT solvers (e.g., MiniSAT, Z3, and STP) have enabled many formal verification, testing, program analysis, and synthesis techniques; proof assistants (e.g., HOL and Coq) are increasingly being used to verify the correctness of mission-critical software; and computer algebra systems (e.g., Maple and Mathematica) have long been used by physicists and engineers to symbolically reason about mathematical models. Additionally, many of these tools (esp. proof assistants) are now being used by mathematicians to construct proofs for long-standing open conjectures (e.g., the Kepler conjecture).

While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task for mathematics in general -- shown by Godel and Turing -- the work of many mathematicians and computer scientists has led to significant progress in automation of many fragments of mathematics since the 1950's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.

Significant practical contributions

Over the last several years, I have led the development of four award-winning solvers that have made significant impact on software engineering, security, and increasingly on combinatorial mathematics: MapleSAT, The Z3 String Solver, MathCheck, and STP. MapleSAT (2015 - present) is a SAT solver for solving Boolean formulas obtained from software engineering and security applications. The crucial insight in MapleSAT -- key to winning gold and silver medals at the 2016 and 2017 SAT competitions -- is that solver heuristics (e.g., branching and restarts) can be modeled as reinforcement learning techniques. The MapleSAT Boolean SAT solver is one of the first solvers where machine learning and deductive methods have been symbiotically combined for greater efficiency.

The Z3 string solver (2013 - present) is a solver for formulas from a rich theory of string equations, regular expressions, and arithmetic. The Z3 string solver has proven to be particularly useful in the context of security applications. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.

Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012), a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and is currently in use in more than 100 research projects in academia and industry. STP has been used to find over a thousand security bugs in Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I won an ACM Test of Time Award at the CCS 2016 conference.

Significant theoretical contributions

On the theoretical front, I have resolved several open problems in logic and complexity theory. The most prominent being decidability and undecidability theorems for different fragments of theories over string equations, regular expressions, arithmetic, and the string-integer conversion predicate.

In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers are effective at combinatorial search, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about any particular mathematical domain or the mathematical-conjecture-under-test. On the other hand, Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics. Perhaps our most prominent success was to find the smallest counterexample to the Williamson conjecture -- a question that had been open since 1944.

In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.

Current and future directions

All my current and future projects, at their core, are about "automation of mathematics via solver/provers". For example, I am not only interested in novel SAT/SMT algorithms (see 2. and 3. below), but also interested in their proof complexity (see 1. below). Further, I am interested in their applications in mathematics and physics via SAT+CAS tools like MathCheck (see 4. and 6. below), as well as in security (see 5. and 7. below).

  1. Proof complexity of SAT/SMT solvers and formal verification algorithms: One of the most important problem in solver and verification research is "why do solvers work efficiently, even though the problems they solve are traditionally considered hard (NP-complete, PSPACE-complete,...)." One of my research goal is to understand this phenomenon through the lens of proof complexity theory, which is the ideal setting for a theoretical understanding of SAT/SMT solvers and formal verification algorithms in general. Within this context, my collaborators and I have already made significant progress. We have identified parameters such as mergeability of Boolean formulas that go a long way in explaining the structure of industrial instances, and why solvers are efficient on such instances. The eventual goal of this research program is to prove parameterized proof complexity-theoretic theorems that explain precisely why solvers are efficient for large classes of industrial instances.

  2. Novel SAT/SMT algorithms via machine learning: As mentioned above, one of the solvers my students and I developed is the MapleSAT solver, one of the most efficient SAT solvers today. MapleSAT combines (inductive) reinforcement learning algorithms for branching and restarts with (deductive) conflict analysis. It is important to note that modern SAT solvers are algorithms that inherently perform decisions during their search whose rewards may only known far into the future. Such algorithms can particularly benefit from reinforcement learning methods as we have already demonstrated in MapleSAT. My goal is to continue with the integration of machine learning algorithms with deductive methods inside solvers. Given the success of MapleSAT I expect this line of research to be very fruitful for developing powerful new methods for both SAT and SMT solvers.

  3. Novel SMT algorithms for first-order and higher-order theories (e.g., theories over strings): I plan to continue to work on fragments and extensions of string theories as part of the Z3 string solver.  Theories over strings are of great importance in the context of computer security, and developing practically efficient algorithms for them is very challenging. The Z3 string solver is an important development, but a lot remains to be done. For example, there are virtually no good practical solvers for context-free grammar constraints or combinations of word equations, arithmetic over length, and context-free grammars. Another theory of interest is floating-point arithmetic, especially given its relevance to physics simulation software.

  4. Combining SAT, CAS, and the probabilistic method: As mentioned above, one of my enduring interests is to develop effective tools for counterexample construction and finite verification of math conjectures. In this context, my students and I developed the MathCheck SAT+CAS tool. We plan to continue extending this tool, especially in the context of the probabilistic method. The key idea underlying the probabilistic method is the use of probability theory to non-constructively establish the existence of large combinatorial objects. The problem we would like to address is "how can we constructively establish the existence of large combinatorial objects whose existence has been established non-constructively via the probabilistic method". Given the ability of SAT+CAS to search large combinatorial spaces it is natural to use SAT+CAS tools to constructively establish the existence of large combinatorial objects, thus augmenting the power of the probabilistic method.

  5. Attack-resistance of security defense mechanisms: Computer security researchers often propose defense mechanism without providing formal guarantees of security. To address this problem, my collaborators and I developed an idea we call attack-resistance, wherein we formally analyze security defense mechanisms via the idea of computational indistinguishability from cryptography. We say that a defense mechanism is attack-resistant provided a resource-bounded attacker cannot computationally distinguish the mechanism-under-analysis from an ideal unbreakable system. We applied this to several well-known defense mechanisms, and were able to show that these mechanisms were indeed attack-resistant in a formal sense. Additionally, our analysis led us to finding weaknesses in other mechanisms. We published our first big paper on this topic at the Euro Security & Privacy conference 2017 held in Paris, France.

  6. Verification and testing of physics software: In 2016, we learnt from the LIGO gravitational wave observatory that gravitation waves are produced by colliding binary black holes. These observatories (aka gravitational wave telescopes) crucially rely on software to model colliding black holes, and use machine learning techniques to filter away noise detected by LIGO. Given my background in formal methods, I have been wondering about the following question for some time: “what if there is a bug in physics software that misleads physicists into thinking that they have made a new discovery (false positive) or masks potentially interesting new physics (false negative)”. It turns out that this is a widespread problem for scientific software in general, and there are many categories of problems in this context that largely remain unexplored. One class of problems has to do with errors for floating-point arithmetic computations, e.g., overflow errors. Another problem has to do with approximation errors in floating point computations vis-a-vis the corresponding real-valued functions. Yet another important problem in this context is the following: how one can establish equivalence between different implementations of the same real-valued function. I plan to leverage my experience in building SMT solvers and software testing/verification tools to address some of these questions.

  7. Verification and testing of smart contracts: Since the advent of bitcoin, the term blockchain has become commonplace and represents an exciting new class of technologies that have the potential to fundamentally transform many industries. There are many blockchain technologies out there, e.g., Ethereum from Waterloo and Hyperledger from IBM. The term smart contract refers to programs that represent contracts between entities, wherein, the contract resides on the blockchain and leverages all the features that blockchain provides (e.g., immutability after parties commit to the contract). Unfortunately, smart contracts are plagued by security problems. This is a great opportunity for verification/testing experts to apply their expertise to the smart contract world. My students and I are researching novel SAT and SMT solvers algorithms aimed at security analysis of smart contracts.