European Semantic Web Conference 2006
Budva, Montenegro
June 11-15 2006
Introduction
These are notes from the sessions I attended at ESWC'06 last week. For more detailed (and accurate

) information, check out the following:
I (Norman) have also interleaved some notes, though only for some talks which seem slighly more generally useful.
W5: S4: Querying
Xcerpt
- Rule based query language
- XML vs RDF data
- RDF inherently graph structured
- Augment data terms by edges
- Vs Sparql
- Allows ordering
- Versatile access
- Query construct using conjuncts from XML & RDF simultaneously
- Use xml query to find terms on which RDF query is based
- All in one query statement
XAMaXoS
- Need efficient implementation of Xcerpt
- Abstract Machine
- Hw virtualisation
- Wine
- JVM, CLI
- Instruction set + machine model
- cF algebra = operators + data model
- Precise query semantics
- ==> optimizability
- Aims
- Language neutral
- Focus on in-memory processing of distrib data
- Initially
- Ad-hoc index creation
- Distributed evaluation
- Split into base relations -> conjunctive queries
- Optimization: move 'tough' decisions to compile-time
- Data model
- Basic type: node with properties
- Memory model: memoization matrix
- Compute spanning tree from graph
- Use tree to compute matrix
- Three phase algorithm
- Matrix population
- Expansion of non-tree joins
- Matrix consumption
- First version end of the year
XMore precise typing rules for Xcerpt
- Descriptive typing for Xcerpt
- Type inference
- Type checking
- => locating errors in programs
- Formalisms
- Data terms
- Type definitions
W5: S5: Reasoning II
Extending OWL web node with Reactive behaviour
- Triggers
- ON event WHEN rule DO action
- Events on OWL level can be derived
- No distinction between base & derived relations
- ON DELETE OF hasHusband DO
- If (dept, hasProfessor, …) rule added
- ON INSERTION OF hasEmployee OF department
- RAISE EVENT (new_employee(…))
- => reacts on change in model
- Pre-reasoning triggers
- Post-reasoning triggers
- React on changes to model
- See example (CREATE TRIGGER test …)
- Architecture
- Web service
- Jena based module with active functionality
- PostgreSQL? db + RDF facts
- DIG tel&ask i/f
- Algorithm
- Verify direct triggers: rollback if not consistent
- Jena RDF-framework
- See alternatives
- Conclusion
- Simple solution
- Ready-to-use
Open & Closed-World reasoning
- Desirable to apply closed world reasoning to subset of knowledge
- Where knowledge complete
- E.g. list of EU countries
- Extended logic programming
Reasoning with Temporal constraints
- Temporal RDF
- TRDF + intervals
- TRDF + temporal constraints
- Create entailment of intervals
T5: Application development using Sesame
Introduction
Tutorial
Sesame framework
- Sesame
- For storage, querying and inferencing
- Java library
- Repository
- Reasoning support
- Custom rule engine (coming)
- Various backends
- Rio toolkit
- Architecture (see slide)
- Http server (superset of SQARQL)
- Repository access api
- SeRQL?, SPARQL (declarative querying)
- SAIL api (storage and inference)
- Rio (RDF io)
- Application either direct to rep api or via Http
- SAIL api
- Abstraction from physical
- System level api (not dev level)
- Repository access api (dev level)
- Installation
- Java 5.0 + Tomcat 5.x
- Deploy sesame.war
- Configure server
- Reasoning
Sesame includes an HTTP server, which allows you to add data to a knowledgebase through the web and (I'm pretty sure) query the result. Sesame will support 10^7 triples on desktop hardware (I think this means that you can load that many, and query the result, but not necessarily inference with them, in the sense of query the implied triples). Sesame's reasoning support includes RDFS and OWLIM (see below), and it allows you to plug in your own (domain-specific) custom rule engine, presuming you have one handy.
Differences to Jena: Sesame started out with a focus on declarative querying and added the API (I imagine this is somewhat simplified), whereas Jena started out as an RDF API and added querying later. They're converging, though the Sesame download is about 1MB against Jena's 8MB, which the Sesame folk are rather pleased about. Relative performance depends heavily on the case. [NG]
Querying
- http://www.openrdf.org/
- SeRQL vs SPARQL
- Similar
- SeRQL (pronounced `circle')
- Nested queries (IN, EXISTS operators)
- Efficient Sesame impl
- SPARQL
- W3C std * Tool interop: Jena, Redland, 3Store, Sesame
- See examples slide
- Can they be transformed
- Most queries
- Internally parsing to same query object model
- Path expression: {X} P {Y}
- Chaining
- Branching
- Comparison operators: string, boolean
- Query composition
- Overlay query graph onto actual
- Optional path expressions
- RDF is semi-structured
- […]: if matched then return that data
- CONSTRUCT queries
- Graph transformations
- E.g. playsInMovies
- Create new graph
- Can use api to feed back into repository
- Nested query
- SeRQL? has IN, ANY & ALL, EXISTS
- See examples
- Speed
- Can apply optimization
- Rules of thumb
- SAIL allows 'prepare' to test for optimization * For some SAILs ¨ Equality for RDBMS backend
- SPARQL
- Allows functions to be defined (sorts of properties)
- Query returns graph
- So can populate new in-memory store
SeRQL? allows nested queries, for example all movies which don't have a rating: that is, all movies where a query for the rating fails. Or find the highest rating for each user: that is, the film for which the rating is bigger than any other match for a given user (the implication was that this is difficult or impossible in SPARQL). The two languages don't look
massively different on the page; SPARQL possibly looks a bit more SQL-like. [NG]
Using the Sesame API
- Using Sesame as library
- E.g. (see slide) create repository object
- Querying
- Execute query and iterate over results
- Transactions
- Context support
- May come from separate RDF files
- => provenance tracking, versioning & time-tracking
- Default vs named context
- Querying: FROM CONTEXT
Elmo
- http://www.workbrain.com/
- Maps sesame repository to java beans
- Uses query expansion & caching
- AugurRepository? method
- Can be used outside of Elmo
- See Architecture on slide
- Other tools
- Scutter: crawls distrib RDF networks
- Smusher: find duplicates
OWLIM
- http://www.ontotext.com/owlim
- Supports RDF(S) & limited OWL-Lite
- Semantics
- Reasoning customizable
- : empty, RDFS, owl-horst, owl-max
- OWL support
- Almost all primitives
- Close to OWL-Lite
- RDFS support
- In-memory reasoning & Reliable persistence
- Uses TRREE
- Relatively slow delete operation
- Fast upload, retrieval & query
- Forward-reasoning avoids need for query optimization
- Configurable SAIL for Sesame
- Almost ready with Sesame 2.0
- OWLIM & Sesame serve WSML
- WSMO infrastructure
- Loads LUBM(50,0) in 15 mins
- Only other one able to load it takes 12 hours
- BigOWLIM?
- Passed LUBM(8000,0)
- 1.06 Billions explicit statements
- 69 hours to load
- (The original OWLIM is now referred to as SwiftOWLIM?) [NG]
- Only need to use Sesame
- And use OWLIM as SAIL
- Can also go via Elmo
- OWLIM is open source
- But TRREE is not
- Free for in-memory version
- ¤1000 per cpu for big version
OWLIM is described as a `scalable semantic repository', using a `pragmatic subset of OWL'. In ascending order of complexity/expressivity, the various languages are:
- RDFS
- OWL DLP (Description Logic Programming: the intersection of OWL and Prolog, more or less)
- OWLIM (includes some overlap with OWL Lite)
- OWL Lite (all the easy bits of OWL)
- OWL DL
- SWRL (OWL plus some rules; heading towards standardisation)
- OWL Full
Most scalable RDF repositories tend, it seems, to be somewhere between RDFS and OWL Lite.
OWLIM supports almost all the OWL primitives, except from Thing, Nothing, differentFrom, and complementOf, though some primitives are only partially supported. However it doesn't have the DL-specific constraints, so that meta-classes (classes of classes), and properties linking classes to instances, are OK. [NG]
Tutorial: Semantic web policies
[NG] This was a half-day tutorial, though Piero Bonatti also gave a talk on the issues later in the meeting.
Part of the REWERSE network, which is concerned with Rules on the Web
Policies aren't just about security but include business rules, workflow management and so on. It's risky and costly to encode such policies in code, and then to change them.
At present, current technologies include XACML (built around 'rules', but very simple, without chaining) and P3P (rudimentary ontology, ill-specified)
Open systems authorisation
- Example: getting wireless service in an airport lounge via a frequent flier card, pre-pay card, credit card, airline employee status
- Illustrates privacy issues. Why should you disclose sensitive information to this server?
- Self-regulation: SEAL program (TRUSTe, BBBOnLine?, WebTrust?), following best practices, subject to audit
Expressiveness requirements
- 'Policy' means secutiry, business rules, QoS?, etc, but all make decisions based on attributes, such as age, possession of ID, and so on
- Policies can be active as well as passive: for example triggering manual registration procedures, or logging
- Evidence: Strong (easy to reason about: ID, credit cards, subscriptions), soft (inc PGP, possibly strong but hard to reason about web of trust, eBay reputations), lightweight.
They talk of `co-operative policy enforcement', by which they mean never just saying
no, but instead explain why not (you don't have a valid card), and what the user might do about it (ask
X for an account), and supporting what-if and query scenarios.
Requirements
- Well-defined semantics. No surprises: conclusions should be the same for any reasoner
- Monotonic: no decrease in access on disclosure ('grant access if requester is not a student'). This is because you can't reliably find all the properties a requester has, so making deductions based on the absence of properties is unreliable.
- Some exceptions: might check with VISA for the absence of a revocation
- Doesn't apply to time-based policies: might have access before 5, but not after
- Delegation is necessary in many cases, both for privacy and efficiency reasons (the predicate has a valid VISA card is evaluated by VISA, not by the policy owner). In this context, loops are a problem, but aren't errors
- Policies can be sensitive: 'access only Sun&Microsoft' (so they're cooperating), 'records available only to psychiastrist/parole officer', 'my pictures only to my friends' (but I can't see them, so I'm not a friend!)
- Must be able to reason in situations involving loops. Alice will let her friends see her photos, and decides that her friends include Bob's friends; Bob decides that his friends include Alice's friends; hence a loop. There's another standard scenario: two CIA agents meet, and each one says I'll show you my CIA credentials if you show me yours first.
- Usability is hard: `too often only the PhD? student who designed a policy language can use it effectively' [Seaham, ESWC'02?]
- Conflict resolution
- different policies may apply
- one policy permits, another denies
- one policy obliges, another denies
- Might need a proof that a policy has been satisfied, to pass on to a third party
- Many policy languages lack (non-prototype) implementation
- And then you need tools
Current policy languages
- Languages are well-defined/formal semantics; centralised/distributed evaluation
- XACML: distributed policies, but centralised evaluation, no formal semantics
- procedural semantics defined in Haskell, and incompletely, so that extensions can be made in an ad hoc way (XACML is at v2.0, but there's only a draft semantics document for v1.0)
- not declarative
- No negation, so not monotonic
- no variables, so not rules in the rules-community sense, and no chaining
- no delegation
- all policies are public (since they're evaluated centrally)
- conflict resolution, with deny/permit overrides
- there are multiple implementations and tools
- P3P: platform for privacy preferences
- schema, not a language
- informational, and doesn't enforce compliance
- policies may be ambiguous
- Kaos
- policies brought to central point for evaluation
- uses OWL ontologies
- 4 types of policy: positive/negative authorization/obligation
- Uses DL subsumption reasoning to reason over policies: check for applicability of policies
- Provides a static conflict resolution algorithm, using policy precedences, and punts to user if the algorithm fails
- Well-defined semantics; no negation, so no monotonicity
- PeerTrust?
- guarded distributed logic; distributed/delegated evaluation
- policy protection, so negotiation
- has an implementation in a jar file(works, but not production quality)
- there are tools; policies can be specified using Protégé
- Protune
- general provisional-style actions: actions are performed (and are true if they succeed?)
- big language, by the look of it, with bells, whistles and gongs
- supports negotiation
- implementation ongoing, built on PeerTrust?; there are a few tools
W7: S1: Semantic wiki
GraphingWiki
- http://graphingwiki.virtues.fi/
- Background in computer security
- Black box testing for sw vulnerabilities
- Traditional protocol views (see slide)
- Early viz attempts
- Trying to understand linkages
- Protocol data stored in MoinMoin?
- Then extract into views
- Began with extra markup
- Commonalities with RDF and semantic wikis
- Exported to RDF
Reusing Ontological Background Knowledge
- http://wiki.ontoworld.org
- MediaWiki?
- PHP/MySQL
- Not many SemWeb? tools for this
- Semantic MediaWiki? (SMW)
- Installations
- Ontoworld
- WWW2006
- ESWC2006
- Bible wiki
- Esoteric knowledge wiki
- …
- Mapping of OWL to SMW
- Ontology import
- Upload using mapping
- Kickstart wiki
- Can enrich an existing wiki
- Only works for simple parts: mapped stuff
- OWL more expressive
- Subproperties
- Inverse, funcitonal, transitive props
- Number constraints
- Class constructors: negation, conjunction, disjunction
- …
- But how to add to wiki?
- Extend wiki syntax
- But usable?
- KISS
- But can we keep the sophisticated knowledge
- Architecture
- KAON2 loaded with knowledge
- SMW
- Edit on SMW, check for consistency on KAON2
- So, use background knowledge to check SMW consistency
- Can also categorise page from statements
- Map existing URIs
- More powerful queries
- Bg knowledge & reasoner
- Query interface? SPARQL?
Kaukolu
- http://kaukoluwiki.opendfki.de/cgi-bin/trac.cgi
- Corporate memory via intranet wiki
- How to map intranet information to RDF?
- Wiki pages do not represent ontological resources
- Want to construct RDF fata that complies with existing ontologies
- Shallow vs deep ontologies
- Kaukolu
- Based on JSPWiki
- Sesame 2 as RDF repository & inference engine
- Supports importing ontologies
- Ontology-drive autocompletion in editor
- Inport RDFS, author RDF, reuse info in other apps
- Future
- UI improvements
- Text-to-RDF wizard
- Customized plugins
- Essential for rendering and entering RDF
- Discuss
- Is mapping RDF resources to wiki pages way to go?
- Are RDF triples sufficient for 'everyday knowledge'?
- Shouldn't real basis of wiki semantics be a foundational ontology instead of basic RDF triples?
W7: S2: Lightning pane
Semantic wiki engines
- Makna
- http://makna.ag-nbi.de
- Create & manage semantic info using wikis
- Collab ontology engineering
- Main features (see slide)
- JspWiki? * Semantic additions * Jena
- User + Admin
- Implementation
- Semantic content authoring * Extended wiki syntax
- Context-based presentation/navigation
- Content- & structure-based retrieval
- Future
- Ontology engineering support
- Multimedia extension
- Evaluate usability
- Annotation, Representation and Navigation
- Common theme?
- SemperWiki?
- Annotation
- Dimensions: Attribution, granularity, representation distinction, terminology reuse, object type, context (provenance & scope)
- Representation
- Annotations -> both docs & concepts
- Allow annotation of both
- Navigation
- SweetWiki?
Future of Semantic Wikis
- iMapping Wikis
- ABCDE format
- Semantic conf proceedings
- Think in stories
- Take Latex and add new lines
- Learning with semantic wikis
- http://ikewiki.salzburgresearch.at/
- Recursive self-referential process
- Self-directed learning
- Challenge reader to contribute
- Educational env
- Collaborative features * Story writing
- Wikis as ePortfolios
- Benefits of SWs
- Annotation allow reflection
- Share models between learners
- Reasoning
- Reusability
From Wikipedia to Ontology
- Harvesting wiki consensus
- URIs in wikipedia identify ontology concepts
- Ontology tools & languages barrier to users
- Wikis
- Collab ontology creation
- Use of multimedia elements * Richness of informal concept definitions
- Results
- From wikipedia to semantic relationships
- Semi-automated annotation approach
- Aim
- Identify relations in free text
- Resources
- No training
- Extract relations automatically * Minimal manual intervantion
- Use NLP
- Use wikipedia
- Method
- Extract pairs in NL
- Extract patterns
- Apply patterns
- Produce wikipedia list pages
- More than 23000 related pairs for 20000 wikipedia pages
- Good precision on some pages
- Extracting semantic relationships
- Q: how to do complex structured queries on wikipedia
- E.g. find countries which had non-violent revolutions
- Connectivity ratio
- Correlation with semantic connection strength * Inset better than outset
- COUNTRY too broad a category
- Future
From semantics to wikis
- Wiki and semantics
- Ideliance
- Semantic wiki = slow quick
- Need to design a user-level semantic language
- Extensible RDF presentation engine
- Hyena
- Separate RDF & wiki editing
- RDF by Hyena
- Wiki by Wikked
- Hyena plugins: Vodules
W3: Semantic Network Analysis (SNA)
Representing Social & Cognitive Networks
- Graph is basic entity
- SNA, RCA (Relational content analysis)
- Content analysis
- Social science discipline
- Who influences whom
- Relational content analysis
- Extract relationships between actors, issues, values, facts from texts
- Issue position of actors
- Causal relationships
- RCAs challenges for SW
- Actors are nodes in network
- Complex relations: n-ary relations
- Not binary
- Vectors of real values
- Ambiguity
- Meta-info about documents with triplets
- Example 1: Islam issue
- Info
- Document sets: web sites, newspapers
- Doc info * Medium, contributor, date
- Object info * Simple object ontology
- Data analysis
- Who influences whom
- ==> Action/reaction feedback
- Example 2: Pim Fortuyn
- Increases his standing after negative comments about immigrants
- Towards SW-RDF solutions
- RDFS
- Object info
- Predicate info * Subtype scheme of limited use
- RDF reification
- Of little use to add metadata to triplets * Ambiguous
- Poorly implemented
- Dummy nodes ==> n-ary predicates
- Also ambiguous
- Redundant to have > 2-ary
- Named Graphs
- Express explicitly whether RDF enrichment deals with doc info, triplet info or predicate info
- SW solution: RDFS + Named Graphs
From Semantic to Social
- How to introduce human and social perspectives into KM
- Integrated process
- Heterogenous data
- Generate exploitable data from rough data
- Can manipulate great volume
- Used graphs
- See slide: process based on unified graph
- Text mining
- Graph mining
- SNA
- Steps
- Docs & users interconnected
- Links
- File analysis
- Semantic propagation
- Profile similarity
- #1: File Analysis
- Colocation matrix (HAL-like technics)
- Link words with weights
- Graph clustering
- => clusters with ordered words
- #2: Semantic Propagation
- Select most frequent concepts
- Normalize
- Apply to docs & users
- => graph connecting users, docs & concepts
- #3: Profile Similarity
- Concept graph
- Compute similar docs & k-nearest neighbours
- => browsable graph structure
- Communities of docs and people
- Unsupervised & integrated process
Exploring Social-Topic networks
- Using Author-Topic model
- Allow researcher to explore community
- How are peoples organised
- Support for scientific community
- ?Information needs
- Motivation:
- DBLP: topic links
- Flink: people links
- Goal: social networks with topic communities
- Topic extraction + community identification
- Unsupervised techniques
- Process
- Corpus as bag-of-words with known authors
- => learned topic model
- People + topic distribution
- Topic + keyword distribution
- Topic similarity
- See prototype slide
- Identify topic communities
Measuring Semantic Centrality
- Semantic social network (SSN)
- People (or actors)
- Personal ontologies
- Concepts (or classes)
- 3 layered arch (see slides)
- Centrality
- What does this mean?
- On SSN, who has most powerful interop between heterogenous users
- Measures
- Semantic Centrality
- Power of structural position on social network
- Use: find shortest path between 2 users for them to communicate
- Can predict who can help whom
- Consensual ontology
- Extract most freq and common classes
- Two kinds of semantic centrality
- Local
- Global
- Bridging power between subgroups
Emergent social networks
- Multi-layered model to cluster users' preferences & find semantic relations between them
- Apps: Group Profiles, Recomender systems
- Ontology based user profiles
- Emergent SSNs
- #1: semantic preference extension
- Based on Constrained Spreading Activation (CSA)
- Propagate user pref weights through ontology concepts
- #2: semantic concept clustering
- Classical hierarchical clustering strategy
- Find groups of prefs shared by users
- #3: semantic user clustering
- Assign users to concept clusters
Topic communities in P2P networks
- http://www.aifb.uni-karlsruhe.de/Projekte/viewProjektenglish?id_db=30
- SWAP project & TAGORA project
- Opposite challenges
- Analysis
- What is happening in network
- Construction
- Nodes/agents: 'peers'
- (see slide)
- Use cases
- Bibster network
- Virtual organization
- Basic idea: Shortcut Creation
- Based on SN Metaphors
- Query-dependent vs Query-independent
- Ask question:
- Content provider (has answered question in past)
- Recommender (has asked question in past)
- Bootstrapping network (has good links)
- =>
- Content shortcut
- Recommender shortcut
- Bootstrapping shortcut
- INGA motivation: Social Expert Network
- Implement in p2p network
- So, Build content shortcut index
- #1: Send query using most promising layer of semantic overlay topology
- #2: Evaluate result of query
- #3: Update shortcut index
- Active vs Passive
- Active: based on last but one person in query answer
- Passive: listen to incoming queries
- Register person interested in a topic so that he asks question
- Simulation environment
- Semantic Similarity leads to strong clustering
- But does not give good rates of recall
- Revisiting construction
- Peers
- Query forwarding
- LRU
- Interest-based locality
- Index update
- Success criteria
- Effectiveness: recall
- Efficiency: nr messages
- Robustness: reaction to change
- Toolset?
- Construct SN with help from SN Analysis
Keynote: Frank van Harmelen
This was a rather
good keynote, on why the Semantic Web isn't just plain old Computer Science. The slides are at
http://www.eswc2006.org/keynote-frank-van-harmelen.pdf
There was quite a lot of stuff in it, but the main point was that there are at least four basic assumptions of traditional computer science that aren't true, or at least are interestingly more subtle, on the Semantic Web. These are:
- Traditional complexity measures are poorly applicable. Part of the staple diet of first-year computer scientists is the analysis of complexity measures of algorithms, identifying linear, polynomial and exponential algorithms, and running in terror from the latter. But these are generally worst-case measures, and if the exptime case is in practice exponentially unlikely, then this bad behaviour doesn't matter almost all of the time.
- Some things are hard in theory but easy in practice. For example, reasoning with inconsistent ontologies is both important (reasoning with defaults: the statements birds fly, penguins are birds and penguins don't fly are formally inconsistent, in the sense that the statements penguins fly and not (penguins fly) can both be validly deduced) and terribly complicated (see `defeasible logics', `non-monotonic logic', and whole sessions at meetings such as these). But very simple approaches to this problem -- he mentioned an algorithm consisting of simply adding statements until you find one conclusion or the other -- though they have little formal support, can actually work rather well.
- Context-specific reasoning is important (this includes semantic search, I suppose, though I'm doubtful about how necessary semantic search really is in general)
- And fuzzy logic, or logic with statistics, is much important than it is in general CS.
Semantic Annotations #1
DEMO
- http://omv.ontoware.org
- Annotate ontologies => reuse
- OMV: Ontology Metadata Vocabulary
- Tools
- Current
- Lots of ontologies
- Methodologies
- Tools
- Approach
- Methods & tools for:
- Ontology sharing, discovery & usability
- DEMO
- Design environment for metadata ontology
- Create framework
- Objectives
- Organization
- Develop and promote core & extensions
- Tech infrastructure
- Components
- Engineering
- Evolution
- Extensions
- Applications
- OMV
- Metadata schema
- Core + extensions
- Designed as ontology
- More controlled description
- XML + OWL Lite
- Omv Core
- Conceptualisation
- Implementation
- Concepts
- Extensions
- Tools
- OYSTER
- Onthology
- OntoMeta?
seMouse
- Motivation
- File mgt system has not kept pace with capacity of hdd
- Classification capabilities
- E.g. paper: Title, authors, year, conference
- Automatic metadata extraction based on file format
- File-centric vs User-centric view
- seMouse features
- Ontology aligned
- Doc format & editor independent
- Interface
- Menu based, context-based
- Operations
- Load ontology
- Classification
- Annotation
- Doc relationship
- Authoring
- Browsing
- Ontology loading
- Classification
- Annotation
- Select part of doc & apply annotation
- Doc relationships
- Authoring
- Browsing
- Current
- Integration of seMouse with semantic desktop ontology (Gnowsis)
- ? Ontology creation on the fly
Annotated RDF: aRDF
Semantic Annotations #2
An Environment for Semi-Automatic Annotation of Ontological Knowledge with Linguistic Content
- OntoLing?
- Support linguistic enrichment
- Plugin for Protégé
- -> linguistic KB explorer
- Access to linguistic resources (LRs): WordNet?, FreeLang?, Dict
- Linguistic Watermark
- LR access
- Offers classification of diff LRs
- Provides api for accessing content
- Scenarios
- Explicit ling enrichment
- Produce multilingual ontologies
- LexicoSemantic? enrichment of onts
- Automatize LexicoSemantic? enrichment of ontologies
- Identify pointers (lexico-semantic anchors) from ontological objects to semantic indexes of a LR
- Experimental results
- Good precision and reasonable recall
Managing Information Quality in e-Science
- http://www.qurator.org
- Information and quality in e-Science
- Reqmt on scientists to place data in public domain
- But have to deicde if data is okay
- Variations in quality of data
- No control over quality
- No stds for measuring quality
- Scenario: qualitative proteomics
- Quality is personal
- Reqmts for IQ ontology
- Establish common vocab
- Let users contribute while ensuring consistency
- Making IQ computable in practice
- Quality indicators
- Hit ratio, mass coverage, ELDP
- Need to experimentally establish correlation between indicators and probability of mismatch
- => HitList? {proteinID, HitRatio?, Coverage, …}
- QA: Quality Assertions
- Formally capture clues as funcitons of indicators
- Acceptability criteria are conditions on QAs
- See myGrid
- Let users add to ontology
- Use reasoning to check consistency
- See paper re PI-acceptability
- Computing quality in practice
- Need to add:
- Annotation model * Rep of indicator values as semantic annotations
- Binding model * Data ontology classes -> data resources * Functions ontology classes -> service resources
- Can then build architecture to compute quality
- But how is HitRatio? calculated?
- Programmatically defined currently in web services
- Future: use rules?
- Have an arch which allows users to compute quality criteria
A Lexicon Model for Multilingual/Multimedia Ontologies
- Motivation
- Information extraction
- Providing lexicon for ontology-bnased info extraction
- General:
- Semiotic triangle
- De Saussure
- Adopted in KR (Sowa, 1984)
- Features: Interacting layers
- Images, text etc
- Content : Features : Feature associations : Ontology
- Feature associations (to ontology)
- LingInfo? is an RDFS ontology
- Comparisons
- SKOS
- WordNet?, OntoWordNet?
- GOLD ontology
- Lexical Markup Framework
- Applications
- LingInfo? developed in SmartWeb? project
- Upper model DOLCE
- Domain indep model SUMO
- Other domain ontologies * Sports events * Navigation * Discourse * Multimedia
- German & English
- Ontology-based Info Extraction
- TDL: type description lang: representation lang used by SProUT?
- SProUT? extraction patterns can be triggered by lexical types
- KB generation
- Duplicate detection & redundancy removal
- Apps: image2text
- Extract features
- Look up ontology class using dictionary associations
- Extract features from surrounding text
- Link text features to class
- Apps: text2text
- English -> German
- -> german classifiers
- Other apps
- Dialog processing
- Ontology learning
- WiP?
- Lexical acquisition
- Predicate-argument structure
- …
Semantic Web Mining and Personalisation
Semantic Network Analysis (SNA) of Ontologies
- Centrality measures & Eigensystem analysis
- Semantic Network Analysis (SemNA?)
- Test cases
- SWRC: sem web for research communitites
- SUMO: considered in paper
- Preprocessing
- Each concept and property is node in graph
- Directed edges
- Concept and property hierarchy
- Domain & range of properties
- Centrality measures
- Degree centrality
- Counts in/out connections per node
- Betweenness centrality
- Normalized number of shortest paths between any two nodes that pass through the given node * Is on many communication paths
- Eigenvector centrality
- Related to other relevant nodes
- E.g. page rank
- Eigensystem analysis
- Math tool: structural analysis of graphs
- Allows for directional information and for 'zooming' into substructures
- Complex Hermitian Matrix
- Subspaces used to describe patterns
- Sum of all patternsis original eigenvalue matrix
- Eigenspectrum of SWRC
- Point symmetric -> star structure
- => concept hierarchy is predominant structure in SWRC
- 14 eigenvectors -> 70% relevance
- Representation
- Colour : relevance
- Saturation
- Identify two patterns (brightest red)
- Analysis of SWRC
- Relevance: academic staff > person > employee (mostly irrelevant) : see slides
- Projectors
- BibTeX? part most prominent but non-hierarchical
- Structure: five stars centred around:
- Organization
- Acadademic staff
- Project
- Event
- Person
- Conclusion
- Comparison of centrality measures
- Eigensystem analysis shows same as degree centrality & betweenness centrality but much more
- Shows that certain concepts can be removed (e.g. employee)
- Open issues
- Compare with OntoClean?
- Needs tuning for search, navigation, browse ontologies
Content Aggregation on Knowledge Bases Using Graph Clustering
- Summarization of KBs
- Semantic P2P overlays for KM
- Metrics on ontologies
- Length of path
- Perceived distance reduces with depth
- k-Modes clustering
- Mode := element with largest closeness centrality
- Evaluation
- Choose peers with self-description close to query
- Papers from DBLP & ACM DL
- Evaluate against nr authors
- 40k papers, 317 authors
- Query for each of 1474 acm topics
- Fuser concept (Hovy/Lin 1999)
- Good summary iff subtopics have similar weights
- To get 70% recall only need to query 10% peers
Dynamic Assembly of Personalized Learning Content on the Semantic Web
- http://goodoldai.org.yu
- Ontology-based approach
- Learning paths ontology
- Optimal learning strategy
- User model ontology
- TANGRAM
- Architecture (see slide)
- Content mgt
- UM mgt
- Dynamic assembly
- Coordinator
- UI module
- Ontologies:
- ALOCoM?-based ontologies
- Split into * Content structure ontology * Content type ontology
- IIS domain ontology
- Learning Paths ontology
- User Model ontology
- User modelling stds * IEEE: PAPI & PAPI Learner * IMS LIP
- & other researchers
- See slide of resulting ontology
- Personalized learning
- Functionality
- Provision of learning content
- Access to content
- Future
- More precise formal desc of IIS domain
- Improve TANGRAM subsystem
- Repurposing content
- http://ariadne.fon.bg.ac.yu/TANGRAM/app ??
Interactive Ontology-Based User Knowledge Acquisition
- SW and personalisation
- Two-fold relationship
- Personalisation techniques to enhance usability of SW apps
- SW technologies to enhance user-adaptive apps
- Focus on second point
- SW techs to solve modelling problem
- e-Learning domain
- e-Learning
- Traditional personalisation domain
- User knowledge acquisition
- Difficult to keep current state of user
- Scenario
- Problem
- Can user's conceptual model be used to enable personalisation and adaptation of learning envs on SW
- Approach: via dialogue
- OntoAIMS? architecture (see slides)
- Interactive user modelling
- Dialog agent
- => long-term conceptual state (user model)
- Graphical dialogue screen: OWL-OLM
- Task recommendation / resource browsing
Ontology Alignment #1
Matching Hierarchical Classifications with Attributes
- Using ontologies to match schemas
- CtxMatch? 1.0
- Matching hier classifications: taxonomies
- CtxMatch? 2.0
- Deals with richer schemas
- Include explicit attributes & implicit roles
- Methodology
- Elicited schemas
- Matching is then trivial
- Classifications with attributes
- Images … Italy … Beaches
- But Italy is not subclass of Images!
- Role is implicit
- So images 'about' Italy is subclass of Images
- And Beaches are locatedIn Italy
- Implicit roles are often hidden in the lexical meaning of the node
- CtxMatch? 2.0
- Construct meaning skeletons
- Construct local meaning of nodes
- Filter out incompatible skeletons
- Meaning skeletons
- WDL: lexicalized representation language
- Local meanings
- WordNet? used
- But any other dictionary would do
- Filtering local meanings
- Discard senses not found in the relations of ontology
- May end up with several alternatives
- Relations between local meanings
- Matching using standard reasoning techniques
- Compute mapping
- using formulae for node pairs
- Lexical + Domain knowledge => inference of equivalence
- Peer-to-peer schema matching
- Agents with diff schema
- Will need either using same dictionary or mapping between the two
- Applications: see slide
Community-Driven Ontology Matching
- CDOM
- Involve end users
- Output = annotated mappings
- Architecture
- Overview
- ~50 existing matching systems/approaches
- Hard to reuse
...so this talk was describing a system for making some subset of these matching tools generally usable. If you go to
http://align.deri.org you can put in URLs for a pair of ontologies, and it'll use a few different tools/strategies to match them. It produces suggestions which you can edit, and results in a list of
equivalentClass and presumably
subClass assertions. [NG]
Empiric Merging of Ontologies - A Proposal of Universal Uncertainty Representation Framework
- Background
- Ontology Learning (OLE project)
- Uncertain acquisition of knowledge
- Crisp ontology acquisition
- Preprocessing NL texts
- Taxonomy extraction methods
- Pattern-based
- Clustering-based
- Motivations
- Precision-recall trade-off
- Noise introduction
- Knowledge inconsistencies from diff domains
- Introduced integration & refinement of inconsistent kn * Use empirical consistency measure
- Reflect human mental models
- Not crisp structures * Vague, overlapping referential associations
- Framework
- Format called ANUIC
- Adaptive Net of Universally Interrelated Concepts
- Conviction function
- Utilisation
- 3000 texts from CS
- Used pattern-based for ontology
- Merged ontologies
- Into one ANUIC structure * 5k classes, 9k indivs
- Very rough taxonomy
- See slide for sample
- Improvement by ANUIC 130-200%
An iterative algorithm for ontology matching - Andreas Heß
[NG] There are a variety of underlying use-cases here: perhaps two folk annotate pictures using different ontologies, or web services are described using different ontologies. The underlying algorithm involves representing the match between two ontologies as a weighted graph.
Given two ontologies which are mapped, you can improve the mapping of a third by mapping it to both and then combining the similarities.
Evaluation
- in some cases, lexical matching does more of the work than structural mapping does
- minimum-similarity thresholds have an effect
- there appears to be no overall best mapping strategy
Heterogeneous ontologies - Chiara Ghidini
Very interesting. Concerned with matching/mapping different ontologies describing the same thing, by hand, in complicated cases. They've developed a
Distributed Description Logic (DDL) (this is just the stuff I'm interested in!).
Questions:
- Are mappings between concepts the only form of mappings?
- do people always represent the same knowledge using the same ontological concepts? (`marriage could be a concept or a relation')
- No to both questions!
So mappings are more complex than concept-concept mappings.
LatLong? can be modelled as a pair of concepts,
or a class with two (real) properties. A marriage can be modelled as
(Man, marriedTo, Woman) or as the pair of assertions
(WeddingCertificate?, husband, Man) and
(WeddingCertificate?, wife, Woman). Their syntax distinguishes homogeneous (concept-concept) and heterogeneous (concept-role) bridge rules.
Distributed Reasoning Architecture for a Galaxy of Ontologies: <http://drago.itc.it/>
Questions:
- Effectively use concepts in ontology B using concepts in A? Yes, precisely that.
- Given the set of bridge rules, can you deduce one ontology from the other? Yes, in one direction, but you can't necessarily go the other way without specifying a new set of rules.
- Is that symmetric? So, no.
- How well do chains of bridges work in practice? They don't really have enough experience to tell
Ontology Learning
Automatic Extraction of Hierarchical Relations From Text
- Machine learning for relation extration
- Many algorithms
- SVM using rich features got good results
- Wide data sets
- Good at finding relevant features
- Paper
- Application of SVM
- Investigate variety of NLP features
- Evaluate kernels of SVM
- Used ACE04 Corpus
- Topology of entities and relations
- 7 relation types, 23 subtypes
- Relation hierarchy (see slide)
- Using SVM
- Binary classifier
- One-against-one method
- NLP features
- E.g. Pair of entities in sentence
- Used GATE & plugins
- => 94 features for each possible pair
- Simple features
- Words
- POS tags (part of speech)
- Features in ACE04 corpus
- Overlap features
- Relative position of two mentions
- Syntactic features
- Chunk features
- Dependency feature from MiniPar?
- Parse tree feature from BuChart?
- Semantic features
- From SQLF produced by BuChart?
- From WordNet?
- Discussion
- Every feature has some contribution
- Most features improve the recall
- Complex features do not contribute as much as hoped
- But pay off deeper in the hierarchy
- Diff kernel types
- Linear kernel best performance
- http://gate.ac.uk
- http://nlp.sheff.ac.uk
An Infrastructure for Acquiring High Quality Semantic Metadata
- http://kmi.open.ac.uk/people/index.cfm?id=60
- Quality
- Accurately capture meaning of data object
- Each entity maps to one and only one data object
- Semantic metadata should be correctly populated
- Current support
- On-To-Knowledge
- SCORE
- CS AKtive Portal
- Flink
- …
- Relatively weak support for quality control
- Mainly manual
- Some co-relation and disambiguation
- Case study: KMi web portal
- Requirements
- Automated & adaptive extraction
- Address heterogeneity
- Minimize extraction errors
- Proper population
- Update from new sources
- See ADSI framework slide
- Layers
- Source
- Extraction
- Verification
- Application
- Extraction
- ESpotter tool: info extraction
- Semantic transformation engine
- Instance mapping ontology (KCAP2005)
- Domain ontology
- Verification
- Instance classification tool: PANKOW + WordNet?
- Data querying engine
- Verification engine
Extracting Instances of Relations From Web Documents Using Redundancy
- Relation instantiation
- Relations defined at the instance level
- But which one is right for the instances of the classes
- Existing methods on QA/I.E.
- Use redundancy to make up for loss of performance
- Assumptions
- Not 1-1 relation
- Instantiated concepts
- Must be on web
- Seed set
- Outline
- Retrieve/select corpus
- Identify instances
- Rank candidates
- MultimediaN? E-culture project
- Art & Architecture thes (AAT)
- Unified List of Artist Names (ULAN)
- Triple20 ontology browser
- Redundancy method: CHD
- Google
- ULAN
- Rank candidate artists
- Gold standard manually created using authoritative pages:
- 30 expressionists & 17 impressionists
- Discussion
- F1 promising: 0.68-0.83
- Iterative method
- Comparable F1: 0.71
- Eventually recall is higher
- Future
- Use other domains
- Investigate threshold for when to stop iterations
- Add e.g. time constraints for art styles
- Related
- KnowItAll?: Etzioni et al
- Armadillo: Ciravegna et al
- Normalized Google Distance: Cilibrasi et al
- Semantic distance between terms
Closing plenary
Usability and the Semantic Web
Anthony Jameson (DFKI and International University in Germany)
- Challenges
- Searching/querying
- Minimize complexity for end user
- Ensure minimally necessary understanding
- Adding information to ontologies
- Induce users to do work
- Involve users in design
- Focus on users
- seMouse
- Halo2: knowledge querying
- The 'Digital Aristotle' vision
- DarkMatter?
- Minimizing complexity and cognitive effort
- Strategies
- Recognition rather than recall
- Domain-specific interfaces
- System mapping from input to formal rep
- Don’t require adherence to ontology
- Support trial and error
- OntoIR?
- Concept pick out too complex
- Label parts and present to user
- Expected benefits
- Often little result from semantically based system
- Strategy: * allow easy refinement * Piggyback on methods that yield some benefit
- http://tap.stanford.edu
- SmartWeb?: Stadium
- How to convey understanding to user
- Mental model of system
- When needed * When something goes wrong * Understand unexpected behaviour
- Distinguish design model vs user model
- Ways of conveying mental models
- Suggest what user can do
- Appearance of input elements
- Examples of possible inputs
- Suggest what system has done
- Indocations of information used to derive responses
- SemIPort? document manager
- Intermediate motivation
- Mangrove: Annotation tool
- Long-term
- Community Navigator (Hideaka Takeda)
- Existing research
- Social psychology
- E.g. * Collective effort theory * Goal setting theory
- Utility * Yield unobvious predictions * Often not confirmed
- Groupware, online community
- Tested in practical settings
- Diff from SW apps
- How to motivate users to contribute
- 'Only you can do it'
- Remind of benefits
- Publicize contributions
- Offer money, iPods, chocolate
- Caveat
- Try it out in your setting first
- Conduct users studies throughout design and development
- How to exploit knowledge about users
- Analysis of reqmts
- Look at what they do now and understand it
- i/f design
- Design principles & guidelines, psychological knowledge
- Iterative testing & prototypes
- Summative evaluation of final version
--
TonyLinde - 18 Jun 2006
--
NormanGray - 29 Jun 2006