VIJAY GANESH RESEARCH STATEMENT

The primary focus of my research is the theory and practice of mathematical logic reasoning algorithms (e.g., SAT/SMT solvers or provers) aimed at software engineering, security, and mathematics. In this context, I currently lead the following projects: MapleSAT, The Z3 String Solver, MathCheck, Attack Resistance, smart contract verification/testing, and testing/verification of physics software. From 2005 to 2012, I led the STP project, a solver for a theory of bit-vectors and arrays, that played an important role in the development of dynamic symbolic execution or concolic testing.

Motivation for research in logic reasoning algorithms (aka constraint solvers or provers)

Mathematical logic reasoning algorithms decide whether mathematical
formulas have solutions (dually, decide validity). These tools are
interesting because they have had, and will continue to have, deep
impacts in many areas of software engineering, security, physics, and
mathematics. For example, SAT/SMT solvers (e.g., MiniSAT, Z3, and STP)
have enabled many formal verification, testing, program analysis, and
synthesis techniques; proof assistants (e.g., HOL and Coq) are
increasingly being used to verify the correctness of mission-critical
software; and computer algebra systems (e.g., Maple and Mathematica)
have long been used by physicists and engineers to symbolically reason
about mathematical models. Additionally, many of these tools (esp.
proof assistants) are now being used by mathematicians to construct
proofs for long-standing open conjectures (e.g., the Kepler conjecture).

While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task in general--shown by Godel and Turing--we have made significant progress since the 1940's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.

While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task in general--shown by Godel and Turing--we have made significant progress since the 1940's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.

Significant practical contributions

I have lead the development of four award-winning solvers that have
made significant impact on software engineering, security, and
increasingly on combinatorial mathematics: MapleSAT, The Z3 String
Solver, MathCheck, and STP. MapleSAT (2015 - present) is a SAT solver
for solving Boolean formulas obtained from software engineering and
security applications. The crucial insight in MapleSAT--key to winning
the 2016 SAT competition and getting second place in the 2017 SAT
competition--is that many solver heuristics (e.g., branching and
restarts) can be modeled as reinforcement learning techniques.

The Z3 string solver (2013 - present), which solves formulas from a rich theory of string equations, regular expressions, and arithmetic, has proven to be particularly useful in security. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.

Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012): a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and has been used to find over a thousand security bugs in Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I won an ACM Test of Time Award at the CCS 2016 conference.

The Z3 string solver (2013 - present), which solves formulas from a rich theory of string equations, regular expressions, and arithmetic, has proven to be particularly useful in security. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.

Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012): a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and has been used to find over a thousand security bugs in Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I won an ACM Test of Time Award at the CCS 2016 conference.

Significant theoretical contributions

On the theoretical front, I have resolved several open problems. The
most prominent being decidability and undecidability theorems for
different fragments of theories over string equations, regular
expressions, arithmetic, and the string-integer conversion predicate.

In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers can be effective at combinatorial search up to a certain degree, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about the mathematical-conjecture-under-test. On the other hand, Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics; our most prominent success was to find the smallest counterexample to the Williamson conjecture--a question that had been open since 1944.

In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.

In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers can be effective at combinatorial search up to a certain degree, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about the mathematical-conjecture-under-test. On the other hand, Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics; our most prominent success was to find the smallest counterexample to the Williamson conjecture--a question that had been open since 1944.

In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.

To-date I have won over 25 honors, best-paper awards, and distinctions in my research career since 1999.

Current and future directions

All my current and future projects, at
their core, are about "automation of mathematics via solver/provers". For example, I am not only interested in novel
SAT/SMT algorithms (see 2. and 3. below), but also interested in their
proof complexity (see 1. below). Further, I am interested in their
applications in mathematics and physics via SAT+CAS tools like
MathCheck (see 4. and 6. below), as well as in security (see 5. and 7. below).

- Proof complexity of SAT/SMT solvers and formal verification algorithms:
One of the most important problem in solver and verification research
is "why do solvers work efficiently., even though the problems they
solve are traditionally considered hard (NP-complete,
PSPACE-complete,...)." One of my research goal is to understand
this phenomenon through the lens of proof complexity theory, which is
the ideal setting for a theoretical understanding of SAT/SMT solvers
and formal verification algorithms in general. Within this context, my
students and I have already made significant progress. We have
identified parameters such as community structure of Boolean formulas
that go a long way to explaining the structure of industrial instances,
and why solvers are efficient on such instances. The eventual goal of
this research program is to prove parameterized proof
complexity-theoretic theorems that explain precisely why solvers are
efficient for instances with good community structure.

- Novel SAT/SMT algorithms via machine learning and extension rules: As
mentioned above, one of the solvers my students and I developed is the MapleSAT solver, one of
the most efficient SAT solvers today. MapleSAT combines (inductive)
reinforcement learning algorithms for
branching and restarts with (deductive) conflict analysis. It is
important to note that modern SAT solvers are algorithms that
inherently perform decisions during their search whose rewards may only
known far into the future. Such algorithms can particularly benefit
from reinforcement learning methods as we have already demonstrated in
MapleSAT. My goal is to continue with the integration of machine
learning algorithms with deductive methods inside solvers. Given the
success of MapleSAT I expect this line of research to be very fruitful
for developing powerful new methods for both SAT and SMT solvers.

- Novel SMT algorithms for first-order and higher-order theories: I
plan to continue to work on fragments and extensions of theories of
strings as part of the Z3 string solver. Theories over strings
are of great importance in the context of computer security, and
developing practically efficient algorithms for them is very
challenging. The Z3 string solver is an important development, but a
lot remains to be done. For example, there are virtually no good
practical solvers for context-free grammar constraints or combinations
of word equations, arithmetic over length, and context-free grammars.
Another theory of interest is
floating-point arithmetic, especially given its relevance to physics
simulation software.

- Combining SAT, CAS, and the probabilistic method: The
MathCheck project will continue to have significant impact on
combinatorial mathematics. One particularly interesting application in
this context is the probabilistic method, a field of study developed by
Paul Erdos where probability theory is used to non-constructively
establish the existence of large combinatorial objects. Given the
ability of SAT+CAS to search large combinatorial spaces, it is natural
to use SAT+CAS tools to constructively establish the existence of
objects which otherwise have been established via the probabilistic
method.

- Attack-resistance of security defense mechanisms: Computer security researchers often propose defense mechanism
without providing formal guarantees of security. To address this
problem, my collaborators and I developed an idea we call
attack-resistance, wherein we formally analyze security defense
mechanisms via the idea of computational indistinguishability from
cryptography. We say that a defense mechanism is attack-resistant
provided a resource-bounded attacker cannot computationally distinguish
the mechanism-under-analysis from an ideal unbreakable system. We
applied this to several well-known defense mechanisms, and were able to
show that these mechanisms were indeed attack-resistant in a formal
sense. Additionally, our analysis led us to finding weaknesses in other
mechanisms. We published our first big paper on this topic at the Euro
Security & Privacy conference 2017 held in Paris, France.

- Verification and testing of physics software:
In
2016, we learnt from the LIGO gravitational wave observatory that
gravitation waves are produced by colliding binary black holes. These
observatories (aka gravitational wave telescopes) crucially rely on
software to model colliding black holes, and use machine learning
techniques to filter away noise detected by LIGO. Given my background
in formal methods, I have been wondering about the following question
for some time: “what if there is a bug in physics software that
misleads physicists into thinking that they have made a new discovery
(false positive) or masks potentially interesting new physics (false
negative)”. It turns out that this is a
widespread problem for scientific software in general, and there are
many categories of problems in this context that largely remain
unexplored. One class of problems has to do with errors for
floating-point arithmetic computations, e.g., overflow errors. Another
problem has to do with approximation error in floating point
computations vis-a-vis the corresponding real-valued functions. Yet
another important problem in this context is the following: how one can
establish equivalence between different implementations of the same
real-valued function. I plan to leverage my experience in building SMT
solvers and software testing/verification tools to address some of
these questions.

- Verification and testing of blockchain smart contracts: Since
the advent of bitcoin, the term blockchain has become
commonplace and represents an exciting new class of technologies that
have the potential to fundamentally transform many industries.
There are many blockchain
technologies out there, e.g., Ethereum from Waterloo and Hyperledger
from IBM. The term smart contract refers to programs that represent
contracts between entities, wherein, the contract resides on the
blockchain and leverages all the features that blockchain provides
(e.g., immutability after parties commit to the contract).
Unfortunately, smart contracts are plagued by security problems. This
is a great opportunity for anyone interested in verification and
security to apply them to the smart contract world. My students and I
are researching novel SAT and SMT solvers algorithms and their
implementation aimed at dynamic symbolic security analysis of smart
contracts.