VIJAY GANESH RESEARCH STATEMENT

The primary focus of my research is the theory and practice of mathematical logic reasoning algorithms (e.g., SAT/SMT solvers or provers) aimed at software engineering, security, and mathematics. In this context, I currently lead the following projects: MapleSAT, The Z3 String Solver, MathCheckAttack Resistance, smart contract verification/testing, and testing/verification of physics software. From 2005 to 2012, I led the STP project, a solver for a theory of bit-vectors and arrays, that played an important role in the development of dynamic symbolic execution or concolic testing.

Motivation for research in logic reasoning algorithms (aka constraint solvers or provers)

Mathematical logic reasoning algorithms decide whether mathematical formulas have solutions (dually, decide validity). These tools are interesting because they have had, and will continue to have, deep impacts in many areas of software engineering, security, physics, and mathematics. For example, SAT/SMT solvers (e.g., MiniSAT, Z3, and STP) have enabled many formal verification, testing, program analysis, and synthesis techniques; proof assistants (e.g., HOL and Coq) are increasingly being used to verify the correctness of mission-critical software; and computer algebra systems (e.g., Maple and Mathematica) have long been used by physicists and engineers to symbolically reason about mathematical models. Additionally, many of these tools (esp. proof assistants) are now being used by mathematicians to construct proofs for long-standing open conjectures (e.g., the Kepler conjecture).

While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task in general--shown by Godel and Turing--we have made significant progress since the 1940's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.

Significant practical contributions

I have lead the development of four award-winning solvers that have made significant impact on software engineering, security, and increasingly on combinatorial mathematics: MapleSAT, The Z3 String Solver, MathCheck, and STP. MapleSAT (2015 - present) is a SAT solver for solving Boolean formulas obtained from software engineering and security applications. The crucial insight in MapleSAT--key to winning the 2016 SAT competition and getting second place in the 2017 SAT competition--is that many solver heuristics (e.g., branching and restarts) can be modeled as reinforcement learning techniques.

The Z3 string solver (2013 - present), which solves formulas from a rich theory of string equations, regular expressions, and arithmetic, has proven to be particularly useful in security. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.

Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012): a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and has been used to find over a thousand security bugs in  Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I  won an ACM Test of Time Award at the CCS 2016 conference.

Significant theoretical contributions

On the theoretical front, I have resolved several open problems. The most prominent being decidability and undecidability theorems for different fragments of theories over string equations, regular expressions, arithmetic, and the string-integer conversion predicate.

In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers can be effective at combinatorial search up to a certain degree, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about the mathematical-conjecture-under-test. On the other hand,  Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics; our most prominent success was to find the smallest counterexample to the Williamson conjecture--a question that had been open since 1944.

In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.

To-date I have won over 25 honors, best-paper awards, and distinctions in my research career since 1999.

Current and future directions

All my current and future projects, at their core, are about "automation of mathematics via solver/provers". For example, I am not only interested in novel SAT/SMT algorithms (see 2. and 3. below), but also interested in their proof complexity (see 1. below). Further, I am interested in their applications in mathematics and physics via SAT+CAS tools like MathCheck (see 4. and 6. below), as well as in security (see 5. and 7. below).

  1. Proof complexity of SAT/SMT solvers and formal verification algorithms: One of the most important problem in solver and verification research is "why do solvers work efficiently., even though the problems they solve are traditionally considered hard (NP-complete, PSPACE-complete,...)." One of my research goal is to understand this phenomenon through the lens of proof complexity theory, which is the ideal setting for a theoretical understanding of SAT/SMT solvers and formal verification algorithms in general. Within this context, my students and I have already made significant progress. We have identified parameters such as community structure of Boolean formulas that go a long way to explaining the structure of industrial instances, and why solvers are efficient on such instances. The eventual goal of this research program is to prove parameterized proof complexity-theoretic theorems that explain precisely why solvers are efficient for instances with good community structure.

  2. Novel SAT/SMT algorithms via machine learning and extension rules: As mentioned above, one of the solvers my students and I developed is the MapleSAT solver, one of the most efficient SAT solvers today. MapleSAT combines (inductive) reinforcement learning algorithms for branching and restarts with (deductive) conflict analysis. It is important to note that modern SAT solvers are algorithms that inherently perform decisions during their search whose rewards may only known far into the future. Such algorithms can particularly benefit from reinforcement learning methods as we have already demonstrated in MapleSAT. My goal is to continue with the integration of machine learning algorithms with deductive methods inside solvers. Given the success of MapleSAT I expect this line of research to be very fruitful for developing powerful new methods for both SAT and SMT solvers.

  3. Novel SMT algorithms for first-order and higher-order theories: I plan to continue to work on fragments and extensions of theories of strings as part of the Z3 string solver.  Theories over strings are of great importance in the context of computer security, and developing practically efficient algorithms for them is very challenging. The Z3 string solver is an important development, but a lot remains to be done. For example, there are virtually no good practical solvers for context-free grammar constraints or combinations of word equations, arithmetic over length, and context-free grammars. Another theory of interest is floating-point arithmetic, especially given its relevance to physics simulation software.

  4. Combining SAT, CAS, and the probabilistic method: The MathCheck project will continue to have significant impact on combinatorial mathematics. One particularly interesting application in this context is the probabilistic method, a field of study developed by Paul Erdos where probability theory is used to non-constructively establish the existence of large combinatorial objects. Given the ability of SAT+CAS to search large combinatorial spaces, it is natural to use SAT+CAS tools to constructively establish the existence of objects which otherwise have been established via the probabilistic method.

  5. Attack-resistance of security defense mechanisms: Computer security researchers often propose defense mechanism without providing formal guarantees of security. To address this problem, my collaborators and I developed an idea we call attack-resistance, wherein we formally analyze security defense mechanisms via the idea of computational indistinguishability from cryptography. We say that a defense mechanism is attack-resistant provided a resource-bounded attacker cannot computationally distinguish the mechanism-under-analysis from an ideal unbreakable system. We applied this to several well-known defense mechanisms, and were able to show that these mechanisms were indeed attack-resistant in a formal sense. Additionally, our analysis led us to finding weaknesses in other mechanisms. We published our first big paper on this topic at the Euro Security & Privacy conference 2017 held in Paris, France.

  6. Verification and testing of physics software: In 2016, we learnt from the LIGO gravitational wave observatory that gravitation waves are produced by colliding binary black holes. These observatories (aka gravitational wave telescopes) crucially rely on software to model colliding black holes, and use machine learning techniques to filter away noise detected by LIGO. Given my background in formal methods, I have been wondering about the following question for some time: “what if there is a bug in physics software that misleads physicists into thinking that they have made a new discovery (false positive) or masks potentially interesting new physics (false negative)”. It turns out that this is a widespread problem for scientific software in general, and there are many categories of problems in this context that largely remain unexplored. One class of problems has to do with errors for floating-point arithmetic computations, e.g., overflow errors. Another problem has to do with approximation error in floating point computations vis-a-vis the corresponding real-valued functions. Yet another important problem in this context is the following: how one can establish equivalence between different implementations of the same real-valued function. I plan to leverage my experience in building SMT solvers and software testing/verification tools to address some of these questions.

  7. Verification and testing of blockchain smart contracts: Since the advent of bitcoin, the term blockchain has become commonplace and represents an exciting new class of technologies that have the potential to fundamentally transform many industries. There are many blockchain technologies out there, e.g., Ethereum from Waterloo and Hyperledger from IBM. The term smart contract refers to programs that represent contracts between entities, wherein, the contract resides on the blockchain and leverages all the features that blockchain provides (e.g., immutability after parties commit to the contract). Unfortunately, smart contracts are plagued by security problems. This is a great opportunity for anyone interested in verification and security to apply them to the smart contract world. My students and I are researching novel SAT and SMT solvers algorithms and their implementation aimed at dynamic symbolic security analysis of smart contracts.