VIJAY GANESH RESEARCH STATEMENT
The primary focus of my research is the theory and practice of mathematical logic reasoning algorithms (e.g., SAT/SMT solvers or provers) aimed at software engineering, security, and mathematics. In this context, I currently lead the following projects: MapleSAT solver, complexity-theoretic understanding of solvers, theory and practice of string solvers, MathCheck SAT+CAS system for combinatorial mathematics, Attack Resistance, SAT-based cryptanalysis, smart contract verification/testing, and testing/verification of physics software. To-date I have won over 25 honors, best-paper awards, and distinctions in my research career.
From 2005 to 2012, I led the STP project, a solver for a theory of bit-vectors and arrays, that played an important role in the development of dynamic symbolic execution or concolic testing.
Motivation
Mathematical logic reasoning engines
(aka solvers or provers) are algorithms that take as input mathematical
formulas and decide whether they have solutions (dually, decide
validity). Over last several decades, these algorithms have had deep
impacts in many areas of software engineering, security, physics, and
mathematics. For example, SAT/SMT solvers (e.g., MiniSAT, Z3, and STP)
have enabled many formal verification, testing, program analysis, and
synthesis techniques; proof assistants (e.g., HOL and Coq) are
increasingly being used to verify the correctness of mission-critical
software; and computer algebra systems (e.g., Maple and Mathematica)
have long been used by physicists and engineers to symbolically reason
about mathematical models. Additionally, many of these tools (esp.
proof assistants) are now being used by mathematicians to construct
proofs for long-standing open conjectures (e.g., the Kepler conjecture).
While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task for mathematics in general -- shown by Godel and Turing -- the work of many mathematicians and computer scientists has led to significant progress in automation of many fragments of mathematics since the 1950's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.
While the dramatic impact of these tools is reason enough to continue research on this topic, I am driven by the sheer intellectual challenge of pushing the limits of "automation of mathematics" in the spirit of Leibniz and Hilbert. Despite the impossibility of this task for mathematics in general -- shown by Godel and Turing -- the work of many mathematicians and computer scientists has led to significant progress in automation of many fragments of mathematics since the 1950's. I am in pursuit of the day when solvers and provers will be a crucial part of mathematical practice, just as they already are in engineering and software development.
Significant practical contributions
Over the last several years, I have led the development of four award-winning solvers that have
made significant impact on software engineering, security, and
increasingly on combinatorial mathematics: MapleSAT, The Z3 String
Solver, MathCheck, and STP.
MapleSAT (2015 - present) is a SAT solver
for solving Boolean formulas obtained from software engineering and
security applications. The crucial insight in MapleSAT -- key to
winning gold and silver medals at the 2016 and 2017 SAT competitions --
is that solver heuristics (e.g., branching and
restarts) can be modeled as reinforcement learning techniques. The
MapleSAT Boolean SAT solver is one of the first solvers where machine
learning and deductive methods have been symbiotically combined for
greater efficiency.
The Z3 string solver (2013 - present) is a solver for formulas from a rich theory of string equations, regular expressions, and arithmetic. The Z3 string solver has proven to be particularly useful in the context of security applications. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.
Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012), a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and is currently in use in more than 100 research projects in academia and industry. STP has been used to find over a thousand security bugs in Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I won an ACM Test of Time Award at the CCS 2016 conference.
The Z3 string solver (2013 - present) is a solver for formulas from a rich theory of string equations, regular expressions, and arithmetic. The Z3 string solver has proven to be particularly useful in the context of security applications. This solver is currently in use in many companies including Amazon, IBM, Microsoft, and Google.
Prior to developing MapleSAT and the Z3 string solver, I designed and implemented the STP bit-vector and array solver (2005 - 2012), a popular SMT solver used in software verification and testing. The STP solver played an enabling role in the development of dynamic symbolic analysis, widely considered a breakthrough technology in software engineering and security, and is currently in use in more than 100 research projects in academia and industry. STP has been used to find over a thousand security bugs in Debian Linux tools. For this work on SMT solvers and symbolic analysis, my collaborators and I won an ACM Test of Time Award at the CCS 2016 conference.
Significant theoretical contributions
On the theoretical front, I have resolved several open problems in logic and complexity theory. The
most prominent being decidability and undecidability theorems for
different fragments of theories over string equations, regular
expressions, arithmetic, and the string-integer conversion predicate.
In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers are effective at combinatorial search, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about any particular mathematical domain or the mathematical-conjecture-under-test. On the other hand, Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics. Perhaps our most prominent success was to find the smallest counterexample to the Williamson conjecture -- a question that had been open since 1944.
In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.
In a separate line of work, my students and I developed MathCheck: a SAT solver-based tool to construct counterexamples for math conjectures. The key insight behind MathCheck is that the problem of finite verification or counterexample construction for mathematical conjectures can be reduced to a combinatorial search problem. While SAT solvers are effective at combinatorial search, they are severly handicapped by the fact they don't come equipped with domain-specific knowledge about any particular mathematical domain or the mathematical-conjecture-under-test. On the other hand, Computer Algebra Systems (CAS) can be viewed as storehouses of significant mathematical knowledge, but are poor at combinatorial search. By combining the search capabilities of SAT solvers with domain-specific knowledge of CAS, we can make the problem of searching for counterexamples to combinatorial conjectures tractable. MathCheck is one such SAT+CAS combination. We used MathCheck to resolve many open problems in combinatorial mathematics. Perhaps our most prominent success was to find the smallest counterexample to the Williamson conjecture -- a question that had been open since 1944.
In another line of work, my collaborators and I recently wrote one of the first papers on the proof complexity of SMT solvers. I am also deeply interested in proof complexity of SAT solvers, and am working towards understanding the power of restarts, extension rules, and algebraic proof systems within this context.
Current and future directions
All my current and future projects, at
their core, are about "automation of mathematics via solver/provers". For example, I am not only interested in novel
SAT/SMT algorithms (see 2. and 3. below), but also interested in their
proof complexity (see 1. below). Further, I am interested in their
applications in mathematics and physics via SAT+CAS tools like
MathCheck (see 4. and 6. below), as well as in security (see 5. and 7. below).
- Proof complexity of SAT/SMT solvers and formal verification algorithms:
One of the most important problem in solver and verification research
is "why do solvers work efficiently, even though the problems they
solve are traditionally considered hard (NP-complete,
PSPACE-complete,...)." One of my research goal is to understand
this phenomenon through the lens of proof complexity theory, which is
the ideal setting for a theoretical understanding of SAT/SMT solvers
and formal verification algorithms in general. Within this context, my
collaborators and I have already made significant progress. We have
identified parameters such as mergeability of Boolean formulas
that go a long way in explaining the structure of industrial instances,
and why solvers are efficient on such instances. The eventual goal of
this research program is to prove parameterized proof
complexity-theoretic theorems that explain precisely why solvers are
efficient for large classes of industrial instances.
- Novel SAT/SMT algorithms via machine learning: As
mentioned above, one of the solvers my students and I developed is the MapleSAT solver, one of
the most efficient SAT solvers today. MapleSAT combines (inductive)
reinforcement learning algorithms for
branching and restarts with (deductive) conflict analysis. It is
important to note that modern SAT solvers are algorithms that
inherently perform decisions during their search whose rewards may only
known far into the future. Such algorithms can particularly benefit
from reinforcement learning methods as we have already demonstrated in
MapleSAT. My goal is to continue with the integration of machine
learning algorithms with deductive methods inside solvers. Given the
success of MapleSAT I expect this line of research to be very fruitful
for developing powerful new methods for both SAT and SMT solvers.
- Novel SMT algorithms for first-order and higher-order theories (e.g., theories over strings): I
plan to continue to work on fragments and extensions of
string theories as part of the Z3 string solver. Theories over strings
are of great importance in the context of computer security, and
developing practically efficient algorithms for them is very
challenging. The Z3 string solver is an important development, but a
lot remains to be done. For example, there are virtually no good
practical solvers for context-free grammar constraints or combinations
of word equations, arithmetic over length, and context-free grammars.
Another theory of interest is
floating-point arithmetic, especially given its relevance to physics
simulation software.
- Combining SAT, CAS, and the probabilistic method: As
mentioned above, one of my enduring interests is to develop effective
tools for counterexample construction and finite verification of math
conjectures. In this context, my students and I developed the MathCheck SAT+CAS tool.
We plan to continue extending this tool, especially in the context of
the probabilistic method. The key idea underlying the probabilistic
method is the use of probability theory to non-constructively
establish the existence of large combinatorial objects. The problem we
would like to address is "how
can we constructively establish the existence of large combinatorial
objects whose existence has been established non-constructively via the
probabilistic method". Given the
ability of SAT+CAS to search large combinatorial spaces it is natural
to use SAT+CAS tools to constructively establish the existence of
large combinatorial objects, thus augmenting the power of the probabilistic
method.
- Attack-resistance of security defense mechanisms: Computer security researchers often propose defense mechanism
without providing formal guarantees of security. To address this
problem, my collaborators and I developed an idea we call
attack-resistance, wherein we formally analyze security defense
mechanisms via the idea of computational indistinguishability from
cryptography. We say that a defense mechanism is attack-resistant
provided a resource-bounded attacker cannot computationally distinguish
the mechanism-under-analysis from an ideal unbreakable system. We
applied this to several well-known defense mechanisms, and were able to
show that these mechanisms were indeed attack-resistant in a formal
sense. Additionally, our analysis led us to finding weaknesses in other
mechanisms. We published our first big paper on this topic at the Euro
Security & Privacy conference 2017 held in Paris, France.
- Verification and testing of physics software:
In
2016, we learnt from the LIGO gravitational wave observatory that
gravitation waves are produced by colliding binary black holes. These
observatories (aka gravitational wave telescopes) crucially rely on
software to model colliding black holes, and use machine learning
techniques to filter away noise detected by LIGO. Given my background
in formal methods, I have been wondering about the following question
for some time: “what if there is a bug in physics software that
misleads physicists into thinking that they have made a new discovery
(false positive) or masks potentially interesting new physics (false
negative)”. It turns out that this is a
widespread problem for scientific software in general, and there are
many categories of problems in this context that largely remain
unexplored. One class of problems has to do with errors for
floating-point arithmetic computations, e.g., overflow errors. Another
problem has to do with approximation errors in floating point
computations vis-a-vis the corresponding real-valued functions. Yet
another important problem in this context is the following: how one can
establish equivalence between different implementations of the same
real-valued function. I plan to leverage my experience in building SMT
solvers and software testing/verification tools to address some of
these questions.
- Verification and testing of smart contracts: Since
the advent of bitcoin, the term blockchain has become
commonplace and represents an exciting new class of technologies that
have the potential to fundamentally transform many industries.
There are many blockchain
technologies out there, e.g., Ethereum from Waterloo and Hyperledger
from IBM. The term smart contract refers to programs that represent
contracts between entities, wherein, the contract resides on the
blockchain and leverages all the features that blockchain provides
(e.g., immutability after parties commit to the contract).
Unfortunately, smart contracts are plagued by security problems. This
is a great opportunity for verification/testing experts to apply their expertise to the smart contract world. My students and I
are researching novel SAT and SMT solvers algorithms aimed at security analysis of smart
contracts.