Computer Science and
     Software Engineering

Computer Science and Software Engineering

PHD 03/07

Understanding and Improving Object-Oriented Software Through Static Software Analysis

Warwick Irwin
Department of Computer Science
University of Canterbury

Abstract

Software engineers need to understand the structure of the programs they construct. This task is made difficult by the intangible nature of software, and its complexity, size and change¬ability. Static analysis tools can help by ex¬tracting information from source code and conveying it to software engineers. However, the information provided by typical tools is lim¬ited, and some potentially rich veins of information—particularly metrics and visualisa-tions—are under-utilised because developers cannot easily ac¬quire or make use of the data. This thesis documents new tools and techniques for static analysis of software. It addresses the problem of generating parsers directly from standard grammars, thus avoiding the com-mon practice of customising grammars to comply with the limitations of a given parsing al-gorithm, typically LALR(1). This is achieved by a new parser generator that applies a range of bottom-up parsing algorithms to produce a hybrid parsing automaton. Consequently, we can generate more powerful deterministic parsers—up to and including LR(k)—without in-curring the combinatorial explosion that makes canonical LR(k) parsers impractical. The range of practical parsers is further extended to include GLR, which was originally devel-oped for natural language parsing but is shown here to also have advantages for static analy-sis of programming languages. This emphasis on conformance to standard grammars im-proves the rigour of static analysis tools and allows clearer definition and communication of derived information, such as metrics. Beneath the syntactic structure of software (exposed by parsing) lies the deeper semantic structure of declarations, scopes, classes, methods, inheritance, invocations, and so on. In this work, we present a new tool that performs semantic analysis on parse trees to produce a comprehensive semantic model suitable for processing by other static analysis tools. An XML pipeline ap¬proach is used to expose the syntactic and semantic models of the soft-ware and to derive metrics and visualisations. The approach is demonstrated producing sev-eral types of metrics and visualisations for real software, and the value of static analysis for informing software engineering decisions is shown.