Many software developers spend a lot of time debugging. We believe debugging technology is in its infancy and improved debugging tools can significantly increase the productivity of even the best developers. Even state-of-the-art debugging tools like rr only scratch the surface of what's possible. We have demonstrated this by building Pernosco, a debugger which dramatically improves the state of the art: powerful new features that leverage omniscient debugging to make debugging faster and more fun; novel workflow integrations delivering "debugging as a service"; and new implementation techniques to make omniscient debugging practical and cost-effective. Yet Pernosco is not just a research project; it is being used by real developers and improved in response to their feedback.
The limitations of existing debuggers have convinced many in the software industry that specialized debugging tools are not worth using, or even investing in. This has created a vicious cycle of underinvestment and inadequate tools. We hope to change minds on this issue, inspire developers, break that cycle, and build a sustainable business. To follow our progress, please stay in touch.
Existing debugger interfaces typically show the program state at a particular point in time, with some ability to shift that point forwards in time (or backwards, for debuggers with reverse execution). They're designed that way not because it's the ideal way for developers to understand bugs, but because that's what can be easily implemented. However, many debugging tasks benefit from integrating information across multiple moments in time (e.g., visualizing control flow). Furthermore, forward or reverse execution typically suffer from noticeable delays while application code actually runs.
An alternative approach is "omniscient debugging": collect all program states into a database indexed for efficient queries (e.g. containing every memory and register write), and implement a debugging interface using those queries. This eliminates delays during debugging (thereby eliminating productivity-destroying context switches). It also enables debugger visualizations that seamlessly integrate information across time. Omniscience makes debugging a data analysis problem.
The obvious barrier to omniscient debugging is scalability: building and storing that database is very expensive. We have made tremendous technical improvements over previous implementations of omniscient debugging, and can demonstrate cost-effective debugging of complex applications with recorded execution times of many minutes (though not yet hours).
We record application execution with rr and then build an omniscient database of CPU-level state by replaying execution with binary instrumentation. Deferring database construction to the replay phase keeps the initial overhead low while the application is interacting with its environment (e.g., avoiding spurious timeouts). We don't waste much effort if tests don't fail. Even more importantly, it lets us speed up database building by processing different sections of a single execution in parallel.
We provide our system as a Web service, and run database builds in the cloud. This allows for much more efficient hardware utilization than doing the work on local developer machines. In many deployment scenarios we can build databases using cloud "spot instances" (i.e., deeply discounted excess capacity).
A less obvious issue with omniscient debugging is the challenge of designing a debugger interface once freed from most of the implementation constraints that "traditional" debuggers are subject to: given we can provide almost any desired query efficiently over all program states, what is the best way to convey that data to developers so they can fix bugs in the shortest amount of time? This is a challenging intellectual problem, because existing implementation-constrained interfaces are unreliable guides and the space of possible new interfaces is very large. For the same reasons it is also very exciting!
We believe that many features in "traditional" debuggers are hacks to get around the limitations of being confined to a single moment in time. Single-stepping, for example, is used when developers are afraid of going too far forward in program execution, or when they want to see how control flow or data values evolve over time. Omniscient debugging enables better solutions to these problems.
We also believe that most "cool features" we can imagine are probably not in the ideal set of features needed to understand most bugs in nearly-minimal time, especially when you consider the costs of a large complex interface with many hard-to-discover features. We have tried to avoid the temptation of "wouldn't it be cool if ...?" Instead, we have started with a minimal set of features that seem obviously essential, and incrementally added features to address problems encountered by real users, trying to choose the simplest and most general solution to each problem. To ease Pernosco adoption, we have implemented some not-ultimately-optimal features to make Pernosco more familiar to users, e.g. gdb integration.
There are many features that would clearly be beneficial.
Pernosco should display the values of variables as they changed across source lines, so data changes over time can be directly visualized instead of having to set the current moment in time to successive states.
Pernosco supports prettyprinting values, but currently not in a user-extensible way. Clearly some framework for extensible prettyprinting is needed. This needs to be powerful enough to express the extensions found in JsDbg for example.
The notebook can be extended to capture more context around the notes. In general the notebook is a great platform improving how users think about their debugging problems individually and together. There is probably a lot to learn from what users record in their notebooks.
Supporting more statically compiled languages such as Go would be pretty easy. Some internal refactoring would be needed to support goroutines.
Supporting more simple interpreted languages such as Python should be easy. Supporting some level of JIT compilation in various VMs would be harder but seems tractable.
Pernosco is a natural platform for integrating dynamic analysis. For example, Valgrind/ASAN-style memory checking and dynamic race detection could be implemented as post-recording dynamic analyses in Pernosco's internal framework. Furthermore, the usablity of dynamic bug-detection analysis would benefit tremendously from being supported by Pernosoco's powerful debugging tools. This makes Pernosco a strategic platform for dynamic analysis; a dynamic analysis is more valuable integrated with Pernosco than standing alone.
This applies to performance analysis as well as correctness. rr recording invalidates some kinds of performance analysis (e.g. inter-core interactions), but a lot of performance analysis is still meaningful because the application mostly executes the same code as without rr. We could, for example, replay the application under a cache simulator to estimate the cache behaviour of code, validate the simulation by cross-checking against measurements from hardware performance counters gathered during recording, and visualize the results in the context of the complete Pernosco debugging experience.
Pernosco often has to decide what information is most relevant to the user, e.g. when selecting the stack frame to show when jumping to a new point in time, determining which alert to highlight at the start of a session, or ordering the results of the search box. Given data collected from user debugging sessions, we might be able to learn good heuristics for which results are most likely to be relevant for specific projects or even across all projects.
One of the most difficult debugging scenarios is explaining why something didn't happen. One way to attack this problem would be to compare an execution where something didn't happen with a "closely related" execution where it did.
Closely-related executions where one passes and the other fails are commonly available. For example, if a failure is reproduced by a minimized testcase, then a small change to the testcase will produce a passing execution. Alternatively, if the failure is a regression due to a small code change, the two different code versions produce closely-related pass/fail executions. If the failure is intermittent, then we have closely-related executions with the same code and testcase. Visualizing the differences between these executions seems likely to be valuable, especially if we can learn good relevance heuristics.