e-Stat software development

Digital Social Research programme logoThe Software development based work planned can be broken down into six packages that will involve various members of the node. The project was initiated by our much-missed colleague Professor Jon Rasbash and we have named the software package STAT-JR in his honour.

Work package 1.1: Core model processing software

(Browne, Cameron, Charlton, Szmaragd)

These developments will allow advanced users to specify and estimate complex models. They also allow statistical algorithm developers to experiment with new estimation engines and algorithms to calculate results. These developments will integrate with LEMMA 2 authoring tools which allow advanced practitioners to create “user friendly” interfaces to complex models and efficient estimation procedures. This allows advanced users to create domain-specific model templates to make particular instances of complex models more accessible to novice practitioners. The system is also created in such a way that it will allow mathematically orientated users who have low levels of computing expertise to reason about models and algorithms in mathematical terms and the system will do the “grubby” business of creating efficient computer code to execute the models.

Work package 1.2: Multiprocessing

(De Roure, Michaelides, Cameron, Charlton, Yang)

Work package 1.1 will result in very efficient code for running on a single processor. Work package 1.2 extends these developments to allow calculations to be shared across multiple processors in a single workstation and across multiple workstations, i.e. harnessing the computational power of the grid. The efficiency gains of our single-processor code will mean that many users will not have to resort to the grid to achieve effective performance. This means the precious resources of the computational grid can be preserved for more demanding problems.

Work package 1.3: Interoperability

(Browne, Michaelides, Cameron, Charlton, Szmaragd)

There are two main lines of work here. First, we provide an interface to other statistical packages to allow them to access the eStat system, here eStat is being 'driven' by another package. Second, there is currently no way for statistical packages to 'read' each other's statistical models; we will provide a means for doing this for multilevel models across the set of packages commonly used by social scientists.

Work package 1.4: Electronic notebook

(Michaelides, Rasbash, Cameron, Charlton, Parker)

In this work package we embed the objects produced by the above analysis tools (tables, statistical models, diagrams and graphs) in executable books. As previously described, executable books combine the narrative advantage of traditional books and the experimental and interactive advantages of software packages. Users will often want to export the contents of an electronic workbook to other authoring environments they are familiar with.

Work package 1.5: Integration of electronic notebooks, pre-analysis workflows, and user tools into myExperiment

(De Roure, Michaelides, Yang)

myExperiment is a repository for digital items which supports social networking. It has been designed according to the principles of Web 2.0 and has support for discovery and sharing of specific kinds of digital items, notably scientific workflows. Under eStat, myExperiment will be extended for full support of eStat’s electronic books, in addition to workflows for managing pre-analysis tasks (data location, matching and manipulation into a form suitable for analysis) and user-constructed tools built from eStat core tools (these are analogous to R or Python packages – case studies 2.3, 2.4 and 2.5 are examples of these).

These extensions require myExperiment to have some understanding of the semantics and structure of eStat electronic notebooks, workflows and user-constructed tools. That in turn provides more effective searching and navigation of the myExperiment repository of resources for eStat users. The notion of the electronic book is an extension of the workflow concept and requires new tooling to handle execution of the various forms of scripts that will occur within them.

Work package 1.6: Workflow tools for pre-analysis tasks

(De Roure, McDonald, Michaelides, Lambert, Goldstein, Yang)

This work package centres on the integration of critical ‘pre-analysis’ tasks with the eStat modelling tools, with particular efforts in integrating relevant tools from the DSR DAMES Node and the NCRM ADMIN Node, both of which provide resources in this area. Pre-analysis tasks of ‘data management’ or ‘data manipulation’ include activities such as linking external data files using deterministic and/or probabilistic techniques, and tasks involved in operationalising variables.

Edit this page