r14 - 12 Mar 2007 - 15:07:05 - TonyLindeYou are here: TWiki >  VOTech Web  >  InfraStructure > InfrastructureDesignStudyReportDraft01

VOTech Project: Infrastructure Design Study Report

(released April 2006)

(see also InfrastructureDesignStudyReportCDS and InfrastructureDesignStudyReportESO)

Introduction

  • why this report
  • what it contains

Executive Summary

  • what we said we'd do
  • what we did
  • what we think should happen next

VOTech Project Background

Project Summary

A Design Study will be undertaken aimed at completing all technical preparatory work necessary for the construction of the European Virtual Observatory (EuroVO). The concept of the Virtual Observatory (VObs) is that all the world's data should feel like it sits on the astronomer's desk top, analysable with a user selected workbench of tools, made available through a standard interface. Internationally this is set to transform and re-structure the way astronomy is done. EuroVO is a specifically European implementation of this idea, and will produce a world leading infrastructure providing a unified virtual data resource and the ability to peform complex data discovery and manipulation tasks across the whole range of astronomy. Access to data and tools will be equally good across Europe, regardless of location. This will require establishing an alliance of data centres, and a VObs facility centre in support of the community, but crucially requires the construction of an infrastructural glue of software components, in the context of rapidly evolving background developments in IT and the grid.

The VO-TECH project aims specifically at feasibility studies and design work aimed at integrating such new technologies into the EuroVO. Key IT advances to build on are in intelligent resource discovery (ontology and the semantic web), data mining, and visualisation capabilities. These will be integrated via global astronomical interoperability standards coupled with the latest distributed grid computing services. Additionally this project covers design and preparatory work to ensure that data from the major european telescopes and facilities (as represented by the Opticon and RadioNet networks) is fully accessible through the EuroVO, and where required, is able to offload mass scale computational process onto the EGEE backbone.

In summary, VO-TECH will lay the technical foundations of the EuroVO, a European infrastructure to revolutionise the scientific process

Project Objectives

The top-level objective of the VO-TECH proposal is to complete all technical preparatory work necessary for the construction of the European Virtual Observatory.

pic.ht2.jpg
Figure 1: Click to see image full size

Context Figure 1 shows the conceptual structure of VO-TECH, and how it relates to EuroVO as a whole, the various classes of user, and the general astronomical infrastructure.

The VO-TECH preparatory work needs to link closely with the work of the Data Centre Alliance (DCA) and VO Facility Centre (VOFC), with the final construction of EuroVO following the VO-TECH design study kept in mind. The work also takes place in the context of extensive developments - new algorithms, technologies, and protocols - in academic and commercial IT, and especially of course in generic grid middleware. This work will not be repeated. The job needed is to assess these developments and design astronomy-specific modules based on them. The working links of the project partners with this external world are excellent. Several of the VO-TECH partners have active working relationships with the academic and commercial IT communities, and the project will work with EGEE as an exemplar application area. It is also worth noting that Astronomy in general and the VObs projects in particular have attracted attention as leading edge but pragmatic exemplars of the new e-science approach - for example some of the VO-TECH co-Is have been invited to talk at Bio-Informatics meetings, as well as general Grid meetings.

Objectives of Project:

  1. To assess new technologies and study the feasibility of their incorporation in EuroVO
  2. To create designs of new infrastructure components based on those new technologies
  3. To create designs of science user tools and datamining services
  4. To develop trial versions of new infrastructure components, tools, and datamining services and to test them
  5. To decide what new interoperability standards are required, and to define those standards with international partners
  6. To liaise with the larger EuroVO structure, gaining refreshed versions of science functionality and architecture, and feeding back component test results, designs, and trial components for demonstration suites.
  7. To liaise with computer science, IT industry, and related applications projects in order to mesh with larger standards and to save work wherever possible

Scope of DS3

DS3 covers the following areas:
  • To design and deliver mature infrastructure components for EuroVO
  • To ensure those compinents comprise a platform upon which the other Design Study can undertake their work
  • To assess key new infrastructural technologies and incorporate them as appropriate
  • To ensure interoperability and international integration

This Study aims at producing final designs of mature components, as well as assessments, designs and trials of new components that don't fit into the major categories of DS4-6. It is expected that the priority areas will be in:

  • authentication/authorisation (and eventually accounting)
  • an easy to use API for VO compliant services
  • access to astronomical datasets
  • work-flow
  • distributed storage (VOSpace).

In addition, DS3 has a responsibility for considering interoperability, integration and testing within the context of the overall EuroVO architecture and hence liaising with the VOFC. This will also include full internationalisation of the EuroVO programme, designing customisation tools for deployment across Europe and mix-and-match integration with other projects. DS3 will also have the prime responsibility for liaising with the NA4 work-package of the FP6 programme Enabling Grids for E-Science in Europe (EGEE).

Science Requirements of VObs Infrastructure

  • summary of what science needs from the infrastructure
  • pointers into Science Framework document

Studies and Tasks Undertaken in VOTech

Web presence

VOTech uses its Internet presence for a number of activities: These services are hosted and maintained at the IoA Edinburgh.

Reports

Client-side User Interface report

See: this link.

Wrapping heterogeneous data

[CDS]

Investigations

Provide access to SIAP, SSAP, ConeSearch, OpenSkyQuery etc

A number of early studies were undertaken to evaluate how these protocols might best be supported from within the astroGrid infrastructure. The DSA, CEA and JES componnets were all considered. This has resulted in recent work on the following:
  • DSA: This component is now a family of componanets rather than a monolith. Thus we are working on DSA/Catalogue, DSA/SIAP, DSA/SSAP etc
  • The AR is now capable of access these and other IVOA compliant services. Existing astroGrid components will be refactored over time to use the AR. Once this is complete, it will be possible to call all IVOA compliant services from within any AstroGrid component.

See: AR below.

Prototypes

VizieR ADQL-SkyNode

[CDS]

Mediation tools

[CDS]

Intelligent Agents

- Prototype - Build a pure database system - VOEvent - Investigate wrappers - Network prototype - First cut infrastructure libraries - Agent with workflow core - Recommendation report

VOTechBroker

The VOTechBroker (VOTB) acts as a bridge for submitting parameter sweep computations from the Virtual Observatory to the Grid, and other distributed resources. It provides a number of features, including:

  • Flexibility: Most existing projects provide interaction with a given Grid middleware only, for example Globus. In some cases interaction is tied to a particular version of Globus. Our approach is to utilise whatever middleware a site already has installed. For example if Globus or Condor are not available we can still execute the client's algorithm. We do not require system administrators to install additional software.
  • Transparency: A standards based job description language abstracts clients from the syntax of underlying grid middleware. Conversion between this abstract language and middleware specific syntax is performed by drivers that plug in to the submission framework.
  • Ease of use: Details of parameter sweep job submissions are hidden from the client. Issues such as location of resources, staging of application binaries and data, the monitoring (and restarting) of jobs, the staging of results are taken care of by the broker.
  • Distribution: The broker does not assume a shared file system between resources. It is the responsibility of the information service to detect which endpoints have shared file systems and optimise file staging appropriately.
  • Integration: The VOTechBroker provides a SOAP interface so that a diverse range of clients can submit jobs and monitor progress. Current clients include, CEA wrapped 'thin-clients' for integration with the AstroGrid Workflow, Java (Web start) applications and Web forms.

See this link

Security

Components are being develped following the emerging IVOA architecture. These stand as reference implementations to validate the IVOA architecture and allow EuroVO to implement SSO in advance of IVOA’s final standard.

The AstroGrid components are as follows.

  • Community services, including MyProxy, user management and the attribute service.
  • Facilities in the Astro Client Runtime (ACR) to obtain and store credentials for use of a client application.
  • Java classes to obtain credentials from a MyProxy service.
  • “Security facade” of Java classes to:
    • sign outgoing SOAP requests;
    • check signatures on incoming SOAP requests;
    • handle authorization decisions based on attributes;
    • participate in credential delegation

See here, here and here.

Astro Runtime

The Astro Runtime is a programming interface for any client code that wants to access Virtual Observatory services. It's facilities are exposed via a range of technologies - HTTP / XML-RPC / Java-RMI, making it easily accessible from almost all programming and scripting languages. It hides the complexities of the VO - security, configuration, service resolution - and lets developers get on with the interesting stuff.

The runtime simplifies access to

  • IVOA standard services : Registries, Siap, SSAP
  • All AstroGrid services : MySpace, Workflow, CEA
  • Other popular protocols: NVO Cone-search
  • Useful one-off services - CDS Simbad, etc

It also provides GUI components - simple dialogues that can be reused by client applications to perform common tasks (myspace microbrowser, registry browser, etc). Other benefits of the runtime include single sign-on, single-configuration, and single cache service responses - making implementation of client-side applications simpler.

See this link

PLASTIC

PLASTIC is a protocol for communication between client-side astronomy applications. It is very simple for application developers to adopt and is easily extended. Through PLASTIC applications can do tasks such as instruct each other to load VOTables, highlight a subset of rows or load an image of a particular area of sky. Although such operations are quite simple, they enable powerful collaborations between tools. The philosophy is that the astronomer should have a suite of interoperating tools at his disposal, each of which does one thing well and which can be composed according to need

See this link

Workflow

Develop Use Cases and a prototype for generic workflow requirements

[CDS]

Workflow within AstroGrid Workbench

A workflow aims to accomplish a complex piece of work, for example an astronomical investigation. The workflow builder is designed to enable astronomers to design and develop these complex workflows in a simple and intuitive manner, whilst hiding much of the intricacies of the underlying XML document structure. The use of familiar drag and drop features, tooltips, examples and continuous error checking mean that a novice user can quickly produce simple workflows, and rapidly progress to ever more complex pieces of work. The workflow builder is also designed to interact seamlessly with other features of the workbench (e.g. MySpace and resource browser), and the wider VO community.

See here and [[http://www2.astrogrid.org/Members/PhilNicolson/workflow-builder][here].

Standards

VO WS Profile

[CDS]

UWS (w/ ESO)

An interface for controlling long-running activities ('jobs') via an asynchronous SOAP service. The interface follows the WS-ResourceFramework (draft) standard of OASIS and includes ideas from AstroGrid's Common Execution Architecture. If implemented in full, the interface defines a universal worker service.

See: here and here.

VO Data Storage (w/ ESO)

Overview

There is a strong desire within the Virtual Observatory to have a distibuted storage mechanism, that will allow users to easily refer to a piece of data without necessarily needing to concern themselves with the physical location. The task of defining this system has been given to the IVOA Grid and Web Services WG This conceptual storeage space is called VOSpace. It is envisaged that the VOSpace will manage references to the physical location of data. A primary design goal of VOSpace should be to allow easy integration of existing systems such as SRB or NGAS which have similar goals, but differing levels of abstraction and implementation

There are two primary use cases

  1. A certain dataset is needed for anaysis - for efficiency reasons it would be best if the data could reside "close" in network terms to the compute resource on which the analysis will be performed. In the ideal case this will involve the data being located on the anaysis computer's hard disks. With potentially many analyses being performed on the same data at many locations, it is more efficient if the data are gradually replicated onto these locations
  2. A user would like easily to publish and share his own data.

See this link

Prototypes

  • VOStore interface to NGAS created
  • 1::Many FileManager-FileStore support prototyped at ADASS, Oct 2005

with DS5: Intelligent Resource Discovery

To date there has been little interaction between DS5 and the Infrastructure effort. This is expected to change in the future as DS5 develops tools that may change the way that resources are searched for and used inthe VObs.

with DS6: Data Exploration

PLASTIC

The DS6 effort provided the impetus for developing the PLASTIC interface when it was realised that client-side applications which were exploring and visualising data needed to be able to interact seemlessly. The specification of the PLASTIC interface was mainly developed as a DS6 task and its initial prototyping as a joint DS3/6 effort. While future development of the interface will be under DS3, it is fully expected that DS6 tool development will continue to drive the requirements for PLASTIC.

Grid Integration

As with PLASTIC, DS6 needs drove the development of interfaces from the AstroGrid infrastructure to Grid and HPC resources, in the shape of the the VOTECHBroker (VOTB). VOTB builds on the GridSAM system developed by the London e-Science Centre, and offers users a way to submit jobs to a wide range of Grid resources using the same job description file, written in Job Submission Description Language (JSDL). The requirements for VOTB arose from a data mining application, which requires a large number (~10,000 or more) of small, parameter sweep jobs to be executed, but the system developed will be applicable to a wide range of situations, and it can provide a bridge between the Virtual Observatory and the Grid.

Tools integration

As mentioned above, several tools used in Data Exploration have been integrated into the developing Euro-Vo infrastructure. Foremost amongst these has been VisIVO, a sophisticated visualization tool. As further data exploration tools are integrated, they will begin to test the scalability of the infrastructure.

Conclusions Based on Studies

  • Infrastructure is workable
    • Component approach is scalable
  • Early technology can be deployed now
  • More work needed to meet requirements of other environments
  • More effort needed on:
    • VObs-enabling tools
    • Making data available
    • Adapting infrastructure to new technologies:
      • Semantic web
      • Mobile technologies
      • Linking to Grid and HPC resources

Recommendations for Future Euro-VO

  • Deploy AstroGrid components to all locations
  • Engage with Astronimical Tools providers to design new tools to be VO enabled from the outset
  • Outreach and edutcation
    • Build upon the AstroGrid experience of running science and technical workshops
    • Create educational materials (on-line, off-line and Self Paced Instruction tutorials etc)
  • All data resources to be accessible via VObs
    • Using AstroGrid DSA
    • Or by adapting local services
  • Integrate AAA into all VObs components

  • Further EU funding into:
    • Data alliance:
      • Making data available
      • Creating VObs Resource Centres
    • Adapting to 'new technologies'
    • Providing integrated support facility

References

Appendices

Appendix 1: AstroGrid Infrastructure in Detail

Current infrastructure

The architecture document for AstroGrid v1 is a good description of the current system, release 2006.2. Since v1.1 of the system, there have been three changes in architecture.

  • The ACR was refactored to become the AR (Astro Runtime) and can be used in both services and desktop clients.
  • The PLASTIC protocol, for communication between desktop clients, was introduced.
  • IVOA-standard services are accessible via the AR.

The interface contracts for the service components did not change between v1.1 and v2006.2.

PLASTIC (PLatform for AStronomical Tool InterConnection?) is a VOTech initiative described in the main body of this report. PLASTIC routes messages between programmes through a communications hub, and the AR contains the reference implementation of the hub. The AstroScope feature in the AstroGrid workbench is a "plasticized" application and can display discovered data in a range of plasticized, 3rd-party programmes.

The AR has the ability to make direct calls to data-access-layer services conforming to the standards Cone Search (by US-NVO), Simple Image Access Protocol (SIAP, by IVOA) and Simple Spectral Access Protocol (SSAP, by IVOA). Services of this kind are now available without the need to establish a CEA proxy for each one. Thus, IVOA-standard services published outside Astrogrid can be discovered and used without intervention by AstroGrid personnel. The AstroScope application depends on this capability.

Planned improvements to infrastructure

New security standards

The IT industry and the grid-computing movement have many, detailed standards for control of access to services and on-line resources. IVOA is developing a meta-standard, a profile for use of Single Sign-On authentication standards. Clients and services following the IVOA profile should remain interoperable when secured.

Work is underway to make the AstroGrid components conform to the IVOA SSO profile. This involves:

  • changes to the server components to support authenticated request-messages;
  • rewriting the Community component to issue credentials via the MyProxy protocol;
  • additions to the AR to allow access to credentials and authenticated identities.

The SSO standards allow services to authenticate a caller's identity. AstroGrid also needs software to enforce authorization policies based on these authenticated identities. The form of this software is being researched in VOTech DS3.

AstroGrid components with authentication and simple, iterim authorization-software should become available during the summer of 2006.

VOSpace

AstroGrid MySpace is a prototype. The current implementation does not scale well to thousands of users with millions of files, so we expect to replace it. The replacement will conform to IVOA's VOSpace standard as that standard emerges. This will allow exchange of data in VOSpace between AstroGrid and other IVO projects.

VOSpace is semantically similar to MySpace but has different interface-contracts. Applications using MySpace through the AR should not be disrupted by the changes.

IVOA expects to finalize VOSpace-1 (formerly known as VOStore) during 2006 and VOSpace-2 during 2007. VOSpace-1 has only a sub-set of the features of VOSpace-2: it lacks a hierarchical directory-structure and links between spaces. VOSpace-2 should be a complete replacement for MySpace but VOSpace-1 is not. However, useful VOSpace-1 services are likely to appear in late 2006. Therefore, the transition from MySpace to VOSpace should start this year and extend into 2007.

Universal Worker Service

AstroGrid server components have asynchronous SOAP interfaces where they need to accomodate long-running queries or computations. The controls for the aynchronous executions are parts of the Common Execution Connector (CEC) service-contract that is in turn part of the Common Execution Architecture. The CEC contract is highly successful in service, but is not an IVOA standard. Indeed, IVOA has no standard for asynchronous interfaces.

The Universal Worker Service (UWS) pattern is a work in progress by IVOA to standardize asynchronous executions. It is informed by AstroGrid CEA. Each detailed application of the UWS pattern defines a contract for an IVOA-standard service. UWS for Parameterized Applications (UWS-PA) is a proposed contact for a service that can replace AstroGrid's CEC. When UWS-PA is standardized by IVOA, AstroGrid expects to migrate its server components (the application servers and DSA) from CEC to UWS-PA. The migration will be gradual and, as always, the AR hides the details from application code.

JES rebuild

The current job-entry service can talk only to CEA services (including DSA installations). It cannot talk to IVOA-standard services that provide data, so these services cannot be used in workflows except via CEA proxies, The AR can talk to the IVOA-standard services directly, but cannot execute workflows. AstroGrid intends to rewrite or replace JES such that it provides all the AR features to workflows.

Using the AR in JES insulates JES from any later architectural changes, such as the introduction of VOSpace. Calling JES via the AR insulates applications from possible changes in the JES service-contract.

Edit | Attach | Printable | Raw View | Backlinks: Web, All Webs | History: r14 < r13 < r12 < r11 < r10 | More topic actions
VOTech.InfrastructureDesignStudyReportDraft01 moved from VOTech.InfrastructureDesignStudyReport on 12 Feb 2007 - 17:07 by TonyLinde - put it back
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback