VOTech Project: Infrastructure Design Study Report
(released April 2006)
Introduction
- why this report
- what it contains
Executive Summary
- what we said we'd do
- what we did
- what we think should happen next
VOTech Project Background
Project Summary
A Design Study will be undertaken aimed at completing all technical preparatory work necessary for the construction of the European Virtual Observatory (
EuroVO). The concept of the Virtual Observatory (VObs) is that all the world's data should feel like it sits on the astronomer's desk top, analysable with a user selected workbench of tools, made available through a standard interface. Internationally this is set to transform and re-structure the way astronomy is done.
EuroVO is a specifically European implementation of this idea, and will produce a world leading infrastructure providing a unified virtual data resource and the ability to peform complex data discovery and manipulation tasks across the whole range of astronomy. Access to data and tools will be equally good across Europe, regardless of location. This will require establishing an alliance of data centres, and a VObs facility centre in support of the community, but crucially requires the construction of an infrastructural glue of software components, in the context of rapidly evolving background developments in IT and the grid.
The VO-TECH project aims specifically at feasibility studies and design work aimed at integrating such new technologies into the
EuroVO. Key IT advances to build on are in intelligent resource discovery (ontology and the semantic web), data mining, and visualisation capabilities. These will be integrated via global astronomical interoperability standards coupled with the latest distributed grid computing services. Additionally this project covers design and preparatory work to ensure that data from the major european telescopes and facilities (as represented by the Opticon and RadioNet networks) is fully accessible through the
EuroVO, and where required, is able to offload mass scale computational process onto the EGEE backbone.
In summary, VO-TECH will lay the technical foundations of the
EuroVO, a European infrastructure to revolutionise the scientific process
Project Objectives
The top-level objective of the VO-TECH proposal is to complete all technical preparatory work necessary for the construction of the European Virtual Observatory.
Figure 1: Click to see image full size
Context Figure 1 shows the conceptual structure of VO-TECH, and how it relates to
EuroVO as a whole, the various classes of user, and the general astronomical infrastructure.
The VO-TECH preparatory work needs to link closely with the work of the Data Centre Alliance (DCA) and VO Facility Centre (VOFC), with the final construction of
EuroVO following the VO-TECH design study kept in mind. The work also takes place in the context of extensive developments - new algorithms, technologies, and protocols - in academic and commercial IT, and especially of course in generic grid middleware. This work will not be repeated. The job needed is to assess these developments and design astronomy-specific modules based on them. The working links of the project partners with this external world are excellent. Several of the VO-TECH partners have active working relationships with the academic and commercial IT communities, and the project will work with EGEE as an exemplar application area. It is also worth noting that Astronomy in general and the VObs projects in particular have attracted attention as leading edge but pragmatic exemplars of the new e-science approach - for example some of the VO-TECH co-Is have been invited to talk at Bio-Informatics meetings, as well as general Grid meetings.
Objectives of Project:
- To assess new technologies and study the feasibility of their incorporation in EuroVO
- To create designs of new infrastructure components based on those new technologies
- To create designs of science user tools and datamining services
- To develop trial versions of new infrastructure components, tools, and datamining services and to test them
- To decide what new interoperability standards are required, and to define those standards with international partners
- To liaise with the larger EuroVO structure, gaining refreshed versions of science functionality and architecture, and feeding back component test results, designs, and trial components for demonstration suites.
- To liaise with computer science, IT industry, and related applications projects in order to mesh with larger standards and to save work wherever possible
Scope of DS3
DS3 covers the following areas:
- To design and deliver mature infrastructure components for EuroVO
- To ensure those compinents comprise a platform upon which the other Design Study can undertake their work
- To assess key new infrastructural technologies and incorporate them as appropriate
- To ensure interoperability and international integration
This Study aims at producing final designs of mature components, as well as assessments, designs and trials of new components that don't fit into the major categories of DS4-6. It is expected that the priority areas will be in:
- authentication/authorisation (and eventually accounting)
- an easy to use API for VO compliant services
- access to astronomical datasets
- work-flow
- distributed storage (VOSpace).
In addition, DS3 has a responsibility for considering interoperability, integration and testing within the context of the overall
EuroVO architecture and hence liaising with the VOFC. This will also include full internationalisation of the
EuroVO programme, designing customisation tools for deployment across Europe and mix-and-match integration with other projects. DS3 will also have the prime responsibility for liaising with the NA4 work-package of the FP6 programme Enabling Grids for E-Science in Europe (EGEE).
Science Requirements of VObs Infrastructure
- summary of what science needs from the infrastructure
- pointers into Science Framework document
Studies and Tasks Undertaken in VOTech
Web presence
VOTech uses its Internet presence for a number of activities:
These services are hosted and maintained at the
IoA Edinburgh.
Reports
Client-side User Interface report
See:
this link.
Wrapping heterogeneous data
[CDS]
Investigations
Provide access to SIAP, SSAP, ConeSearch, OpenSkyQuery etc
A number of early studies were undertaken to evaluate how these protocols might best be supported from within the astroGrid infrastructure. The DSA, CEA and JES componnets were all considered. This has resulted in recent work on the following:
- DSA: This component is now a family of componanets rather than a monolith. Thus we are working on DSA/Catalogue, DSA/SIAP, DSA/SSAP etc
- The AR is now capable of access these and other IVOA compliant services. Existing astroGrid components will be refactored over time to use the AR. Once this is complete, it will be possible to call all IVOA compliant services from within any AstroGrid component.
See: AR below.
Prototypes
VizieR ADQL-SkyNode
[CDS]
Mediation tools
[CDS]
Intelligent Agents
- Prototype
- Build a pure database system
- VOEvent
- Investigate wrappers
- Network prototype
- First cut infrastructure libraries
- Agent with workflow core
- Recommendation report
VOTechBroker
The VOTechBroker (VOTB) acts as a bridge for submitting parameter sweep computations from the Virtual Observatory to the Grid, and other distributed resources. It provides a number of features, including:
- Flexibility: Most existing projects provide interaction with a given Grid middleware only, for example Globus. In some cases interaction is tied to a particular version of Globus. Our approach is to utilise whatever middleware a site already has installed. For example if Globus or Condor are not available we can still execute the client's algorithm. We do not require system administrators to install additional software.
- Transparency: A standards based job description language abstracts clients from the syntax of underlying grid middleware. Conversion between this abstract language and middleware specific syntax is performed by drivers that plug in to the submission framework.
- Ease of use: Details of parameter sweep job submissions are hidden from the client. Issues such as location of resources, staging of application binaries and data, the monitoring (and restarting) of jobs, the staging of results are taken care of by the broker.
- Distribution: The broker does not assume a shared file system between resources. It is the responsibility of the information service to detect which endpoints have shared file systems and optimise file staging appropriately.
- Integration: The VOTechBroker provides a SOAP interface so that a diverse range of clients can submit jobs and monitor progress. Current clients include, CEA wrapped 'thin-clients' for integration with the AstroGrid Workflow, Java (Web start) applications and Web forms.
See
this link
Security
Components are being develped following the emerging
IVOA architecture. These stand as reference implementations to validate the
IVOA architecture and allow EuroVO to implement SSO in advance of
IVOA’s final standard.
The
AstroGrid components are as follows.
- Community services, including MyProxy, user management and the attribute service.
- Facilities in the Astro Client Runtime (ACR) to obtain and store credentials for use of a client application.
- Java classes to obtain credentials from a MyProxy service.
- “Security facade” of Java classes to:
- sign outgoing SOAP requests;
- check signatures on incoming SOAP requests;
- handle authorization decisions based on attributes;
- participate in credential delegation
See
here,
here and
here.
Astro Runtime
The Astro Runtime is a programming interface for any client code that wants to access Virtual Observatory services. It's facilities are exposed via a range of technologies - HTTP / XML-RPC / Java-RMI, making it easily accessible from almost all programming and scripting languages. It hides the complexities of the VO - security, configuration, service resolution - and lets developers get on with the interesting stuff.
The runtime simplifies access to
- IVOA standard services : Registries, Siap, SSAP
- All AstroGrid services : MySpace, Workflow, CEA
- Other popular protocols: NVO Cone-search
- Useful one-off services - CDS Simbad, etc
It also provides GUI components - simple dialogues that can be reused by client applications to perform common tasks (myspace microbrowser, registry browser, etc). Other benefits of the runtime include single sign-on, single-configuration, and single cache service responses - making implementation of client-side applications simpler.
See
this link
PLASTIC
PLASTIC is a protocol for communication between client-side astronomy applications. It is very simple for application developers to adopt and is easily extended. Through PLASTIC applications can do tasks such as instruct each other to load VOTables, highlight a subset of rows or load an image of a particular area of sky. Although such operations are quite simple, they enable powerful collaborations between tools. The philosophy is that the astronomer should have a suite of interoperating tools at his disposal, each of which does one thing well and which can be composed according to need
See
this link
Workflow
Develop Use Cases and a prototype for generic workflow requirements
[CDS]
Workflow within AstroGrid Workbench
A workflow aims to accomplish a complex piece of work, for example an
astronomical investigation. The workflow builder is designed to enable
astronomers to design and develop these complex workflows in a simple
and intuitive manner, whilst hiding much of the intricacies of the
underlying XML document structure.
The use of familiar drag and drop features, tooltips, examples and
continuous error checking mean that a novice user can quickly produce
simple workflows, and rapidly progress to ever more complex pieces of
work. The workflow builder is also designed to interact seamlessly with
other features of the workbench (e.g.
MySpace and resource browser), and
the wider VO community.
See
here and [[http://www2.astrogrid.org/Members/PhilNicolson/workflow-builder][here].
Standards
VO WS Profile
[CDS]
UWS (w/ ESO)
An interface for controlling long-running activities ('jobs') via an asynchronous SOAP service. The interface follows the WS-ResourceFramework (draft) standard of OASIS and includes ideas from
AstroGrid's Common Execution Architecture. If implemented in full, the interface defines a universal worker service.
See:
here and
here.
VO Data Storage (w/ ESO)
Overview
There is a strong desire within the Virtual Observatory to have a distibuted storage mechanism, that will allow users to easily refer to a piece of data without necessarily needing to concern themselves with the physical location. The task of defining this system has been given to the
IVOA Grid and Web Services WG This conceptual storeage space is called VOSpace. It is envisaged that the VOSpace will manage references to the physical location of data. A primary design goal of VOSpace should be to allow easy integration of existing systems such as SRB or NGAS which have similar goals, but differing levels of abstraction and implementation
There are two primary use cases
- A certain dataset is needed for anaysis - for efficiency reasons it would be best if the data could reside "close" in network terms to the compute resource on which the analysis will be performed. In the ideal case this will involve the data being located on the anaysis computer's hard disks. With potentially many analyses being performed on the same data at many locations, it is more efficient if the data are gradually replicated onto these locations
- A user would like easily to publish and share his own data.
See
this link
Prototypes
- VOStore interface to NGAS created
- 1::Many FileManager-FileStore support prototyped at ADASS, Oct 2005
with DS5: Intelligent Resource Discovery
To date there has been little interaction between DS5 and the Infrastructure effort. This is expected to change in the future as DS5 develops tools that may change the way that resources are searched for and used inthe VObs.
with DS6: Data Exploration
PLASTIC
The DS6 effort provided the impetus for developing the PLASTIC interface when it was realised that client-side applications which were exploring and visualising data needed to be able to interact seemlessly. The specification of the PLASTIC interface was mainly developed as a DS6 task and its initial prototyping as a joint DS3/6 effort. While future development of the interface will be under DS3, it is fully expected that DS6 tool development will continue to drive the requirements for PLASTIC.
Grid Integration
As with PLASTIC, DS6 needs drove the development of interfaces from the
AstroGrid infrastructure to Grid and HPC resources, in the shape of the the VOTECHBroker (VOTB). VOTB builds on the GridSAM system developed by the London e-Science Centre, and offers users a way to submit jobs to a wide range of Grid resources using the same job description file, written in Job Submission Description Language (JSDL). The requirements for VOTB arose from a data mining application, which requires a large number (~10,000 or more) of small, parameter sweep jobs to be executed, but the system developed will be applicable to a wide range of situations, and it can provide a bridge between the Virtual Observatory and the Grid.
Tools integration
As mentioned above, several tools used in Data Exploration have been integrated into the developing Euro-Vo infrastructure. Foremost amongst these has been VisIVO, a sophisticated visualization tool. As further
data exploration tools are integrated, they will begin to test the scalability of the infrastructure.
Conclusions Based on Studies
- Infrastructure is workable
- Component approach is scalable
- Early technology can be deployed now
- More work needed to meet requirements of other environments
- More effort needed on:
- VObs-enabling tools
- Making data available
- Adapting infrastructure to new technologies:
- Semantic web
- Mobile technologies
- Linking to Grid and HPC resources
Recommendations for Future Euro-VO
- Deploy AstroGrid components to all locations
- Engage with Astronimical Tools providers to design new tools to be VO enabled from the outset
- Outreach and edutcation
- Build upon the AstroGrid experience of running science and technical workshops
- Create educational materials (on-line, off-line and Self Paced Instruction tutorials etc)
- All data resources to be accessible via VObs
- Using AstroGrid DSA
- Or by adapting local services
- Integrate AAA into all VObs components
- Further EU funding into:
- Data alliance:
- Making data available
- Creating VObs Resource Centres
- Adapting to 'new technologies'
- Providing integrated support facility
References
Appendices
Appendix 1: AstroGrid Infrastructure in Detail
Current infrastructure
The
architecture document for AstroGrid v1 is a good description of the current system, release 2006.2. Since v1.1 of the system, there have been three changes in architecture.
- The ACR was refactored to become the AR (Astro Runtime) and can be used in both services and desktop clients.
- The PLASTIC protocol, for communication between desktop clients, was introduced.
- IVOA-standard services are accessible via the AR.
The interface contracts for the service components did not change between v1.1 and v2006.2.
PLASTIC (
PLatform for AStronomical Tool InterConnection?) is a VOTech initiative described in the main body of this report. PLASTIC routes messages between programmes through a communications hub, and the AR contains the reference implementation of the hub. The
AstroScope feature in the
AstroGrid workbench is a "plasticized" application and can display discovered data in a range of plasticized, 3rd-party programmes.
The AR has the ability to make direct calls to data-access-layer services conforming to the standards Cone Search (by US-NVO), Simple Image Access Protocol (SIAP, by
IVOA) and Simple Spectral Access Protocol (SSAP, by
IVOA). Services of this kind are now available without the need to establish a CEA proxy for each one. Thus,
IVOA-standard services published outside Astrogrid can be discovered and used without intervention by
AstroGrid personnel. The
AstroScope application depends on this capability.
Planned improvements to infrastructure
New security standards
The IT industry and the grid-computing movement have many, detailed standards for control of access to services and on-line resources.
IVOA is developing a meta-standard,
a profile for use of Single Sign-On authentication standards. Clients and services following the
IVOA profile should remain interoperable when secured.
Work is underway to make the
AstroGrid components conform to the
IVOA SSO profile. This involves:
- changes to the server components to support authenticated request-messages;
- rewriting the Community component to issue credentials via the MyProxy protocol;
- additions to the AR to allow access to credentials and authenticated identities.
The SSO standards allow services to authenticate a caller's identity.
AstroGrid also needs software to enforce authorization policies based on these authenticated identities. The form of this software is being researched in VOTech DS3.
AstroGrid components with authentication and simple, iterim authorization-software should become available during the summer of 2006.
VOSpace
AstroGrid MySpace is a prototype. The current implementation does not scale well to thousands of users with millions of files, so we expect to replace it. The replacement will conform to
IVOA's VOSpace standard as that standard emerges. This will allow exchange of data in VOSpace between
AstroGrid and other IVO projects.
VOSpace is semantically similar to
MySpace but has different interface-contracts. Applications using
MySpace through the AR should not be disrupted by the changes.
IVOA expects to finalize VOSpace-1 (formerly known as VOStore) during 2006 and VOSpace-2 during 2007. VOSpace-1 has only a sub-set of the features of VOSpace-2: it lacks a hierarchical directory-structure and links between spaces. VOSpace-2 should be a complete replacement for
MySpace but VOSpace-1 is not. However, useful VOSpace-1 services are likely to appear in late 2006. Therefore, the transition from
MySpace to VOSpace should start this year and extend into 2007.
Universal Worker Service
AstroGrid server components have asynchronous SOAP interfaces where they need to accomodate long-running queries or computations. The controls for the aynchronous executions are parts of the Common Execution Connector (CEC) service-contract that is in turn part of the Common Execution Architecture. The CEC contract is highly successful in service, but is not an
IVOA standard. Indeed,
IVOA has no standard for asynchronous interfaces.
The Universal Worker Service (UWS) pattern is a work in progress by
IVOA to standardize asynchronous executions. It is informed by
AstroGrid CEA. Each detailed application of the UWS pattern defines a contract for an
IVOA-standard service. UWS for Parameterized Applications (UWS-PA) is a proposed contact for a service that can replace
AstroGrid's CEC. When UWS-PA is standardized by
IVOA,
AstroGrid expects to migrate its server components (the application servers and DSA) from CEC to UWS-PA. The migration will be gradual and, as always, the AR hides the details from application code.
JES rebuild
The current job-entry service can talk only to CEA services (including DSA installations). It cannot talk to
IVOA-standard services that provide data, so these services cannot be used in workflows except via CEA proxies, The AR can talk to the
IVOA-standard services directly, but cannot execute workflows.
AstroGrid intends to rewrite or replace JES such that it provides all the AR features to workflows.
Using the AR in JES insulates JES from any later architectural changes, such as the introduction of VOSpace. Calling JES via the AR insulates applications from possible changes in the JES service-contract.