Semantic Astronomy SeRQL Tutorial
Practical Semantic Astronomy Workshop Hack-a-thon
Elizabeth Auden, AstroGrid / VOTech
15 February 2008
Overview
OpenRDF's
Sesame knowledge base stores RDF data and provides a mechanism for queries and inferences. The examples in this tutorial cover the following topics:
- Using multiple ontologies to describe astronomical data
- Creating RDF files that provide astronomical metadata
- Uploading RDF data and ontologies to a Sesame endpoint
- Querying RDF data in a Sesame endpoint using SeRQL
If you want to install a Sesame knowledge base on your local machine, start with
Setting Up a Sesame Knowledge Base. If you want to try out SeRQL queries first, go straight to
Example SeRQL Queries.
Choosing Ontologies
The OWL Web Ontology Language, or
OWL, is the W3C recommendation for a language that can process semantic resource descriptions. OWL is used to construct ontologies, and the language is built on top of RDF. Three ontologies hosted by Ed Shaya and Brian Thomas at
http://archive.astro.umd.edu/ivoa-onto/src/main/resources/ were uploaded to the MSSL Sesame knowledge base:
- astronomy.owl
- physics.owl
- IVOAO.owl
These ontologies are demonstrations of how one may use existing vocabularies to build ontologies. The ontologies are based on vocabulary work being done by the
International Virtual Observatory Alliance (IVOA) Semantics group, but they are not official
IVOA documents.
Generating RDF Data
The Resource Description Framework, or
RDF, is a W3C recommendation for representing metadata about online resources. RDF statements are "triples" consisting of
subjects,
predicates, and
objects. The sample data used in this tutorial consists of RDF descriptions of 110 Messier objects. Information about each object, including identifier, right ascension, declination, flux (B), object type, and spectral classification, has been described using classes defined in astronomy.owl, physics.owl, and IVOAO.owl.
Each Messier object is described by its object type. For example, the RDF description of
M 110 contains a resource that is a globular cluster which has RA of 254.287, has a dec of -4.09933, has a spectral classification of F3, and contains an empty value for flux. The M 10 RDF file can be found at
http://msslxt.mssl.ucl.ac.uk:8080/sesame/messier/rdf/m10.rdf, and RDF files for all 110 Messier objects can be found in the same place (replace "m10.rdf" with m1.rdf, m2.rdf, and so on). M 10 RDF example:
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ast="http://www.ivoa.net/ontology/astronomy.owl#"
xmlns:phys="http://www.ivoa.net/ontology/physics.owl#"
xmlns:ivoao="http://www.ivoa.net/ontology/IVOAO.owl#">
<ast:globularCluster rdf:ID="m10">
<ivoao:identification>M 10</ivoao:identification>
<ast:RightAscension>254.287</ast:RightAscension>
<ast:Declination>-4.09933</ast:Declination>
<phys:flux></phys:flux>
<ast:SpectralClassification>F3</ast:SpectralClassification>
</ast:globularCluster>
</rdf:RDF>
The
W3C RDF validation service is a helpful tool when building RDF files. Before you automate the generation of RDF files for a dataset, it is helpful to copy and paste RDF for one file into the W3C's RDF validator. The online application will state any formatting or validation errors, and if the RDF parses successfully, it will display the triple statements contained in the RDF. Once an optimal RDF structure for the Messier object data had been defined, individual files were generated with the following process:
- An ascii file containing one Messier identifier per line was uploaded to the SIMBAD query by list of identifiers service.
- SIMBAD produced a table containing astronomical information for one Messier object per line.
- The table was processed by a shell script that extracted the following metadata: ID, object type, spectral classification, flux (B), and RA (in hh mm ss) and Dec (in dd mm ss). RA and Dec were converted to decimal degrees.
- Metadata for each Messier object was written to a file using the RDF template above.
Semantic Astronomy Sesame Knowledge Base
Sesame Endpoint and Links
All queries can be performed as user "anonymous" without being logged in. In the top menu bar next to "Read actions", click "SeRQL-S" and try out the example queries.
Sesame 1.x vs. Sesame 2.x
There are two Sesame endpoints hosted at MSSL: one uses the
version 1.2.6 release, and the other uses the
version 2.0 release. The
Sesame 2.0 User Guide explains the key new features of the new version. In particular, Sesame 2.0 implement the
W3C SPARQL protocol, while prior versions implements
SeRQL, or the Sesame RDF Query Language developed for Sesame. Sesame 2.0 is not backwards-compatible with prior releases.
Despite Sesame version 2.0's advantageous support of SPARQL, the version 1.2.6 release is used in this tutorial for two reasons. First, the MSSL verison 1.2.6 endpoint provides RDF storage through
MySQL?, while version 2.0 does not yet support persistent storage through an RDBMS (RDBMS support is under development for upcoming releases). Second, the version 1.2.6 endpoint provides a simple web interface for
SeRQL? queries and RDF data uploads, while version 2.0 focuses on the use of Sesame as a library.
Setting Up a Sesame Knowledge Base
Pre-Requisites
To install Sesame on your local machine, you'll need the following software:
Installation and Configuration
To mimic the Sesame knowledge base setup used in this tutorial:
- Ensure that Java has been installed
- Install MySQL
- Install Tomcat. Start the Tomcat server. (Assuming the Tomcat root directory is $TOMCAT_HOME, go to $TOMCAT_HOME/bin and execution startup.sh (linux / unix) or startup.bat (windows).)
- Once Tomcat is running, deploy Sesame. Copy the file sesame.war into $TOMCAT_HOME/webapps.
- Create a MySQL database for use with your knowledge base.
- Start up MySQL
- Create database: mysql# create database sesame_kb
- Create a user that can access the database: mysql# grant all on sesame_kb to 'user'@'your.machine.name' identified by 'password' (fill in your own values for user, your.machine.name, and password, but leave in the single quotes)
- Create the Sesame repository
- Go to $TOMCAT_HOME/webapps/sesame/WEB-INF/bin and execute the Sesame configuration client: configSesame.sh (linux / unix) or configSesame.bat (windows).
- See Chapter 3 of the Sesame 1.x User Guide for details on setting up server configuration, users, and administrative privileges.
- Load the server configuration into the Sesame configuration client. Click the "Users" tab and create at least one password-protected Sesame user. There should also exist a user "anonymous" with no password.
- Click the "Respositories" tag.
- To create a new repository, click the "Add new repository" button (looks like a single treasure chest with a "+" symbol).
- Fill in values for "id" (example: rdbms-rdfs-sesame_test_kb) and "title" (example: My Sesame Test KB). Click enter.
- Click the new repository's id once, and then click the "Show details" button (looks like a treasure chest with a magnifying glass).
- Click the "Sail Configuration" tab and add the following values under "Sail stack":
- Under Class, click "Add" and type org.openrdf.sesame.sailimpl.sync.SyncRdfSchemaRespository. Hit enter.
- Under Class, click "Add" again and type org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRespository. Hit enter.
- Click org.openrdf.sesame.sailimpl.rdbms.RdfSchemaRespository once to highlight.
- Under Parameters, click "Add" and under Key enter jdbcDriver, then under Value enter com.mysql.jdbc.Driver. Hit enter.
- Under Parameters, click "Add" and under Key enter jdbcUrl, then under Value enter jdbc:mysql://your.machine.name:3306/sesame_kb (same name as MySQL? database created above). Hit enter.
- Under Parameters, click "Add" and under Key enter password, then under Value enter the MySQL? password used with the "grant" syntax above. Hit enter.
- Under Parameters, click "Add" and under Key enter user, then under Value enter the MySQL? user name used with the "grant" syntax above. Hit enter.
- Click the "Access rights" tab. Give read access to the anonymous user, and give read and write access to at one or more of your password-protected Sesame users.
- Click "OK" to save user and repository changes, and then send the new configuration back up to the server. You're read to upload and query data!
Upload Data
You can access your new Sesame knowledge base through a web browser at
http://your.tomcat.installation:8080/sesame. Log in as a password-protected Sesame user, and choose the repository you created above, such as "My Sesame Test KB". Click "go".
First, add data to your new knowledge base. Sesame 1.x offers three options for adding data through the browser. Under "Modify actions", you should see
- Add (file) - upload triple-formatted data from a local file system
- Add (www) - upload triple-formatted data from a URL
- Add (copy-paste) - copy triple-formatted data directly into the text area
Upload Ontologies
To continue replicating the Sesame knowledge base used in this tutorial, first add the three ontologies hosted at the University of Maryland.
- Click "Add (www)"
- In the "URL of data to add" text box, enter http://archive.astro.umd.edu/ivoa-onto/src/main/resources/astronomy.owl.
- In the "base URL to use..." text box, enter the xml:base value used in astronomy.owl: http://www.ivoa.net/ontology/astronomy.owl.
- Click "Add data". You should receive a series of status messages during the statement upload, finishing with "Transaction complete".
Repeat the process for physics.owl and IVOAO.owl.
Note: if you receive an error message about a subclass of
time when uploading IVOAO.owl, save a copy the ontology to your local machine. Locate the class declaration for
UniversalTime and try modifying the subClassOf declaration to look like
. Save the modified IVOAO.owl and upload to your knowledge base using Add (file).
Upload RDF data
Once the ontologies have been uploaded, you can upload "real" data formatted as RDF that can then be queried using classes and relationships described in the ontologies. First, upload the contents of a single RDF file using the "Add (copy-paste)" facility. If the transaction completes with no errors, then further RDF files can be uploaded in bulk using the Sesame API. The following java file can be modified to upload a directory full of RDF files to a Sesame knowledge base:
package org.workshop.semast;
import org.openrdf.sesame.repository.*;
import org.openrdf.sesame.repository.SesameService;
import org.openrdf.sesame.repository.SesameRepository;
import org.openrdf.sesame.admin.AdminListener;
import org.openrdf.sesame.admin.StdOutAdminListener;
import org.openrdf.sesame.constants.RDFFormat;
import org.openrdf.sesame.Sesame;
import org.openrdf.sesame.config.AccessDeniedException;
import org.openrdf.sesame.config.UnknownRepositoryException;
import org.openrdf.sesame.config.ConfigurationException;
import org.openrdf.util.http.CookieManager;
import java.io.IOException;
import java.net.URL;
public class SesameUploadFiles {
public static void main(String[] args) throws AccessDeniedException,
UnknownRepositoryException, ConfigurationException, IOException {
try {
java.net.URL sesameServerURL = new java.net.URL ("http://your.tomcat.installation:8080/sesame");
SesameService service = Sesame.getService(sesameServerURL);
service.login("sesameusername", "sesameuserpassword");
SesameRepository myRepository = service.getRepository ("rdbms-rdfs-db-sesame_test_kb");
// Define full path to directory where RDF files are stored. Then get array of RDF filenames stored in that directory.
String baseDir = "/full/path/to/your/RDF/";
String [] dir = new java.io.File(baseDir).list();
java.util.Arrays.sort(dir);
// Loop through RDF files and upload to Sesame
for (int i=0;i<dir.length; i++){
String file=baseDir+dir[i];
java.io.File myRDFData = new java.io.File (file);
String baseURI = "http://your.tomcat.installation:8080/sesame/";
boolean verifyData = true;
AdminListener myListener = new StdOutAdminListener();
myRepository.addData(myRDFData, baseURI, RDFFormat.RDFXML, verifyData, myListener);
}
} catch (java.io.IOException e) {
System.err.println(e);
}
}
}
Copy this file to directory
org/workshop/semast/SesameUploadFiles.java. Configure occurences of
http://your.tomcat.installation:8080/sesame,
sesameusername,
sesameuserpassword,
rdbms-rdfs-db-sesame_test_kb, and
/full/path/to/your/RDF/.
- Compile: % javac org/workshop/semast/SesameUploadFiles.java
- Execute: % java org.workshop.semast.SesameUploadFiles
When
SesameUploadFiles has finished executing, the contents of all RDF files in
/full/path/to/your/RDF/ will have been ingested to your Sesame knowledge base along with the ontologies uploaded using "Add (www)" or "Add (file)". You're ready to try a few SeRQL queries.
Example SeRQL Queries
To try out these SeRQL queries, you can either use the Sesame knowledge base created on your local machine, or you can try the queries as user "anonymous" on the MSSL Sesame endpoint using knowledge base "MySQL RDFS DB Semantic Astronomy".
Caution: none of the queries below use "OR". Do not attempt an "OR" query during the workshop as this can cause out of memory problems.
Queries about ontology classes
- See all class names included with uploaded ontologies
select C from {C} rdf:type {rdfs:Class}
- Select all subclasses of "galaxy":
SELECT * FROM {sub} rdfs:subClassOf {<http://www.ivoa.net/ontology/astronomy.owl#galaxy>}
- Select all direct subclasses of "galaxy":
SELECT * FROM {directSub} serql:directSubClassOf {<http://www.ivoa.net/ontology/astronomy.owl#galaxy>}
Basic data queries
- Select all entries for a certain event type (e.g., supernova remnants)
SELECT * FROM {C} rdf:type {<http://www.ivoa.net/ontology/astronomy.owl#supernovaRemnant>}
- Click the entry (e.g. http://msslxt.mssl.ucl.ac.uk:8080/sesame/#m1) to see its statements in the format subject | predicate | object.
- Other event types to query in place of ast:supernovaRemnant:
- ast:activeGalacticNucleus
- ast:galacticCluster
- ast:galaxy
- ast:globularCluster
- ast:HeIiIonizationZone
- ast:interactingGalaxy
- ast:linerGalaxy
- ast:openStarCluster
- ast:planetaryNebula
- ast:reflectionNebula
- ast:SeyfertGalaxy
- ast:starburstGalaxy
- ast:starCluster
- Select all properties and values from the RDF entry for M23:
select * from {<http://msslxt.mssl.ucl.ac.uk:8080/sesame/#m23>} p {value}
- Select entries for all globularClusters:
select * from {C} rdf:type {ast:globularCluster} USING NAMESPACE ast = <http://www.ivoa.net/ontology/astronomy.owl#>
- Select Messier object ID, right ascension, and declination from all entries:
select ID, RA, Dec from {C} ivoao:identification {ID}; ast:RightAscension {RA}; ast:Declination {Dec} using namespace ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>, ast = <http://www.ivoa.net/ontology/astronomy.owl#>
- Select the Messier object ID and spectral classification of all globular clusters:
select object, ID, spectralClass, objectType from {object} ivoao:identification {ID}; ast:SpectralClassification {spectralClass}; serql:directType {objectType} rdfs:subClassOf {ast:globularCluster} using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>
- Select object entry, ID, RA, Dec for all entries:
select object, ID, RA, Dec from {object} ivoao:identification {ID}; ast:RightAscension {RA}; ast:Declination {Dec} using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>
- Select object entry, ID, spectral classification, and object type where ID equals "M 61":
select object, ID, SpecClass, objectType from {object} ivoao:identification {ID}; ast:SpectralClassification {SpecClass}; serql:directType {objectType} where ID = "M 61" using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>
- Select object entry, ID, spectral classification, flux, and object type for any object with an "F" type spectral classification:
select object, ID, SpecClass, flux, objectType from {object} ivoao:identification {ID}; ast:SpectralClassification {SpecClass}; phys:flux {flux}; serql:directType {objectType} where SpecClass like "F*" using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>, phys = <http://www.ivoa.net/ontology/physics.owl#>
More interesting data queries
- Select entries only that are subclasses of ast:starCluster:
select distinct C from {C} rdf:type {} rdfs:subClassOf {ast:starCluster} using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>
- Select ID, RA, Dec, flux, and spectral class of all entries that are subclasses of ast:starCluster:
select object, ID, RA, Dec, Flux, SpecClass, objectType from {object} ivoao:identification {ID}; ast:RightAscension {RA}; ast:Declination {Dec}; phys:flux {Flux}; ast:SpectralClassification {SpecClass}; serql:directType {objectType} rdfs:subClassOf {ast:starCluster} using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>, phys = <http://www.ivoa.net/ontology/physics.owl#>
- Box query: select all objects plus their ID, RA, Dec, and object type that fall into a box defined by 180.0 < RA < 190.0 and +11.0< Dec < +14.0 :
select object, ID, RA, Dec, objectType from {object} ivoao:identification {ID}; ast:RightAscension {RA}; ast:Declination {Dec}; serql:directType {objectType} where RA > "180.0"^^xsd:float and Dec < "14.0"^^xsd:float and RA < "190.0"^^xsd:float and Dec > "11.0"^^xsd:float using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>, phys = <http://www.ivoa.net/ontology/physics.owl#>
- More specific box query: select all galaxies (and subclasses of galaxies) plus their ID, RA, Dec, and object type that fall into a box defined by 180.0 < RA < 190.0 and +11.0< Dec < +14.0
select object, ID, RA, Dec, objectType from {object} ivoao:identification {ID}; ast:RightAscension {RA}; ast:Declination {Dec}; serql:directType {objectType} rdfs:subClassOf {ast:galaxy} where RA > "180.0"^^xsd:float and Dec < "14.0"^^xsd:float and RA < "190.0"^^xsd:float and Dec > "11.0"^^xsd:float using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>, phys = <http://www.ivoa.net/ontology/physics.owl#>
- Put it all together: select all types of star cluster (including globular clusters and open star clusters) with "F" type spectral classifications that fall into a box defined by 244.0 < RA < 276.0 and -30.0 < Dec < -10.0:
select object, ID, RA, Dec, SpecClass, objectType from {object} ivoao:identification {ID}; ast:RightAscension {RA}; ast:Declination {Dec}; ast:SpectralClassification {SpecClass}; serql:directType {objectType} rdfs:subClassOf {ast:starCluster} where SpecClass like "F*" and Dec < "-10.0"^^xsd:float and RA < "276.0"^^xsd:float and Dec > "-30.0"^^xsd:float and RA > "244.0"^^xsd:float using namespace ast = <http://www.ivoa.net/ontology/astronomy.owl#>, ivoao = <http://www.ivoa.net/ontology/IVOAO.owl#>, phys = <http://www.ivoa.net/ontology/physics.owl#>
Screenshot of final Sesame query results:
Suggestions for further discussion
- What other queries would you like to try? The OpenRDF SeRQL? examples are a good place to start.
- More advanced queries: what sort of inferences would you like to try with your ontologies?
- Do you have existing data sets that could be reformatted as RDF in a new Sesame knowledge base?
- Do existing ontologies created by IVOA members cover your knowledge domain sufficiently, or would you like to augment them with your own ontologies (or ontologies from other disciplines)?
- What about the use of RDFS or SKOS in place of the OWL files used here?
- Do you have java, perl, or python applications into which SeRQL? queries could be built?
With thanks to the University of Southampton for assistance with both RDF structure and SeRQL queries.
--
ElizabethAuden - 15 Feb 2008