Search My Blog

Monday, March 14, 2011

IBM Research | IBM Research | SHER


SHER is an OWL reasoner that is designed to provide semantic querying of large relational datasets using OWL ontologies.

SHER Overview

SHER (Scalable Highly Expressive Reasoner) is a breakthrough technology that provides ontology analytics (OWL-DL without nominals) over highly expressive ontologies.
  • SHER does not do any inferencing on load, and hence deals better with fast changing data (the downside is of course that reasoning is performed at query time).
  • SHER can reason on ~7 million triples in seconds, and scales to datasets with 60 million triples, responding to queries in minutes. SHER has been used to semantically index 300 million triples from the medical literature.
  • SHER tolerates logical inconsistencies in the data, and can quickly point users to these inconsistencies in the data, and help the user clean up inconsistencies before issuing semantic queries.
  • SHER provides explanations (or justifications) for why a particular result set is an answer to the query. This is useful for validation by domain experts.

  • For a high level overview, here's a podcast that describes SHER, and some use cases for the technology.

    How it works

    SHER's reasoning technique relies on a novel combination of indexing the instances of the database from the perspective of reasoning. This indexing technique summarizes the instance data into a very compact representation that is used for reasoning. For details, see Summarization, ISWC 2006 and Tech report for ISWC 2006.

    SHER uses this representation to efficiently filter instance data that is irrelevant for answering a certain query, and selectively uncompresses portions of the summarized representation relevant for the query, in a process called refinement. The combination of summarization and refinement is key to SHER's scalability. For details, see Paper in AAAI 2007 and Tech report for AAAI 2007. Internally, SHER uses the popular open-source OWL-DL reasoner, Pellet, to reason over the summarized data and obtain justifications for the data inconsistency.

    SHER performs membership query answering as well as conjunctive query answering using a set of optimization techniques described in this paper on optimizing membership query answering and conjunctive querying. These optimization techniques leverage summarization in the context of conjunctive querying, and also incorporate faster incomplete reasoning techniques into query answering. SHER therefore has an internal knob which can be used to get fast, incomplete answers to queries. This faster algorithm can help retrieve large result sets for most queries within a minute or two.

    Use cases

    Automated Clinical trials matching using ontologies

    In collaboration with researchers at Columbia University Medical Center, we used SHER to find electronic patient records that match clinical trials criteria. The problem in automating clinical trials matching is that patient data is noisy, coded in local terminologies, and highly specific. Clinical trials queries, however, tend to be much more general. Bridging the gap between the two requires significant knowledge engineering, which has to be customized for each institution. For example, the patient records contain records such as Patient X was medicated with a vendor specific drug Y. Clinical trials criteria, however, are specified in terms of a broad class of drugs, such as patients that are on medications involving an active ingredient Z.

    Together with researchers from Columbia, we investigated whether it was possible to re-use the knowledge in the SNOMED ontology to bridge the gap between electronic medical records and clinical trials queries. SHER successfully found matches for the clinical trials queries on a large 1 year patient dataset from Columbia (60 million triples). For details, see ISWC 2007, Clinical trials matching paper. For a set of slides about this case study, see Clinical trials matching case study slides.

    Cleaning up text extraction output using ontologies

    SHER has been used in the context of SemanticClean, a project that examines whether it is possible to use OWL reasoning to clean up inconsistencies in data generated by text extraction. Depending on the number of inconsistent patterns present in the data, SHER can detect several thousand inconsistencies, and it takes between 10-67 minutes for dataset sizes that are between ~800K-2 million triples. For details, see ISWC 2007, paper on SemanticClean.

    Searching PubMed with ontologies on AnatomyLens

    AnatomyLens provides a semantic, concept based search over annotations of PubMed articles and GOA annotations using the GO (gene ontology) and FMA (Foundational Model of Anatomy) ontologies. Users enter anatomy terms, MeSH terms, and biological processes as search keywords. Anatomy Lens is more precise and has better recall than text search. For example, for the query Alzheimer's, brain, neuron development, Anatomy Lens will match Alzheimer's articles that discuss dendrite development in the hippocampus, whereas a standard text search will only find articles containing the queried keywords explicitly and might also find articles that are unrelated (such as articles about neuron development in the spine).

    Here's a presentation of SHER and its applications: SHERAndItsApplications.ppt


    SHER is available at the following URL, free for academic use: Try SHER

    Platform requirements

    Operating systems: Windows® and Linux®

    Installation instructions

    Please see the included SHER-Documentation.pdf file, located in the /doc folder when you expand the tarball.

    Go there...

    Linked Open Data and Extraction of Vocabularies from Linked Open Data



     66 results  

    Technology Date
    3D Fast Fourier Transform Library for Blue Gene/L and Blue Gene/P
    A C++ library for computing distributed complex-to-complex, three-dimensional Fast Fourier Transforms on the Blue Gene/L and Blue Gene/P supercomputer.
    Application Advancement Assistant for WebSphere Application Sever Community Edition
    A tool that enables developers of Java EE applications to migrate from IBM WebSphere Application Server, Community Edition, to the more advanced WebSphere Application Server family of products.
    Automatic Testing Toolkit for Virtualization Providers
    A smart cross-platform toolkit for developing virtualization provider test framework
    Centralized User Management for the IBM Virtualization Engine
    On-demand systems management via a single, consolidated interface, using IBM's Virtualization Engine.
    CIM Repository Synchronization for Cloud Computing
    A tool to synchronize Common Information Model (CIM) Repositories which are deployed on different servers.
    A Java-based, real-time programming game based on the Eclipse platform.
    Collaborative Code Review Tool
    A collaborative code review plug-in for Eclipse.
    Compound XML Document Toolkit
    A standards-based, schema-driven toolkit for mixed-namespace XML documents.
    Custom Math Functions for High Performance Computing
    Implementations of various transcendental math functions, including "erfc", with no conditional branches.
    Data Discovery and Query Builder
    A Web-based framework for searching database records to identify and correlate data based on semantic concepts rather than specific data layouts.
    Design Pattern Toolkit
    An Eclipse-enabled template engine for generating applications based on customizable, model-driven architecture transformations.
    Dynamic Logical Partition Command Line Tool for Integrated Virtualization Manager
    A tool that enhances the usability of the Dynamic Logical PARtition (DLPAR) feature of systems which are managed with Integrated Virtualization Manager (IVM).
    Eclipse Based Foundation Toolkit for Heterogenous Database Applications
    A lightweight heterogenous database tool for developers enabled by deployment on to an existing Eclipse shell
    EMBL/FASTA Wrapper for WebSphere Information Integrator
    A tool that enables (by using SQL) access, retrieval, and federation of bio-sequences and data stored in flat-file, specialized data sources in either EMBL or FASTA format.
    Expedited Real-Time Task Graphs
    A deterministic, real-time programming model for Java with supporting tools and run-time environment.
    Expert System for Tuning Optimizations (ESTO)
    A tool that tunes the set of parameters for optimizing a specific target program.
    Flexible Internet Evaluation Report Architecture
    A highly flexible architecture for the design, display, and reporting of Internet surveys.
    Full-System Simulator for IBM PowerPC 970
    A full-system simulation infrastructure and tools for the PowerPC 970 instruction set.
    GAIAN Database
    A distributed federated database using a biologically inspired self-organization principle to minimize management.
    Graphical LPAR Monitor for System p5 Servers
    A graphical LPAR monitor for the System p5 Server that allows the status of CPU and memory resources used by one or more LPARS to be monitored side by side.
    A graphical tool for discovering possible Java heap leaks.
    High Productivity Computing Systems Toolkit
    A framework and toolkit that automates the detection of bottlenecks in application performance.
    IBM Broadband Transmission-line Characterization Using Short-Pulse Propagation
    A software toolkit containing an advanced 2D field solver and signal-processing facility for extracting broadband transmission line properties.
    IBM Cluster Monitor
    A one-stop, easy-to-use, configurable, fully automated, Web-based tool for monitoring and observing cluster-wide performance in the form of visible graphs.
    IBM Dynamic Application Virtualization
    A technology that enables computationally-intensive applications to take advantage of accelerated libraries on remote, back-end systems (including Cell Broadband Engine), reducing time to deployment and disruption to business.
    IBM Electromagnetic Field Solver Suite of Tools
    A suite of full-wave and quasi-static electromagnetic field solver tools used to calculate the electrical parameters for interconnection and packaging design.
    IBM Full-System Simulator for the Cell Broadband Engine Processor
    A full-system simulation infrastructure and tools for the Cell Broadband Engine processor.
    IBM Hash Suffix Array Delta Compression
    A new differential compression algorithm that combines the hash value and suffix array techniques.
    IBM Image Construction and Composition Tool
    The IBM Image Construction and Composition Tool enables users to construct custom virtual images that they can provision with Tivoli Provisioning Manager and IBM WebSphere CloudBurst Appliance, or use in IBM Smart Business Development and Test on IBM Cloud.
    IBM Integrated Ontology Development Toolkit
    An ontology toolkit for storage, manipulation, query, and inference of ontologies and corresponding instances.
    IBM Lock Analyzer for Java
    A cross-platform tool that provides an insight into how well Java locks are performing in a live Java application.
    IBM MapReduce Tools for Eclipse
    An Eclipse plug-in that simplifies the creation and deployment of MapReduce programs.
    IBM Parallel Machine Learning Toolbox
    A toolbox for running machine learning algorithms on parallel computing platforms.
    IBM Performance Simulator for Linux on POWER
    A tool that provides users of Linux on POWER a set of performance models for IBM's POWER processors.
    IBM Real-Time Class Analysis Tool for Java
    A tool enabling the deployment of real-time Java applications without manual definition of classes to be preloaded.
    IBM Resource Monitor for BladeCenter Server Room
    An easy-to-use and flexible tool to monitor and search modules in a BladeCenter server room, to improve hardware resource utilization and asset security.
    IBM Scheduler for High Throughput Computing on IBM Blue Gene P
    A lightweight scheduler that supports high-throughput computing (HTC) applications on Blue Gene/P.
    IBM TuningFork Visualization Tool for Real-Time Systems
    An Eclipse-based visualization and performance analysis tool for real-time applications with support for Java, C++, IBM's Real-time JVM, and Linux.
    IBM Web Ontology Manager
    A Web-based system for managing Web Ontology Language (OWL) ontologies.
    IBM Workplace Server Performance and Health Monitor
    A Web application that enables users to gauge the performance and health of a Workplace server.
    IBM XL UPC Compilers
    A compiler with implementation for Unified Parallel C (UPC) High-Performance Computing (HPC) applications on large-scale, parallel processing machines.
    Open Virtualization Format Toolkit
    A composition tool to build software virtual appliances in the new standard format, Open Virtualization Format.
    OpenCL Development Kit for Linux on Power
    OpenCL - The open standard for parallel programming of heterogeneous systems
    Pmcount for Linux on Power Architecture
    A hardware performance counter tool for the IBM POWER4, POWER4+, POWER5, POWER5+, POWER6, and PowerPC 970 processors.
    Post-Link Optimization for Linux on POWER
    A post-link optimization utility for the POWER architecture that optimizes an executable program or a shared library, based on its run-time profile.
    Preservation DataStores
    An OAIS-based preservation-aware storage component that supports the future usability of digital information.
    Provider Acceptance Test Suite
    A tool that validates vendor and IBM providers before integrating them into the IBM Storage Management Product Portfolio.
    Really Small Message Broker
    A very small messaging server that uses the lightweight MQTT publish/subscribe protocol to distribute messages between applications.
    Resource Simulator for IBM Systems Director Data Model
    An extension for IBM Systems Director whose commands can simulate and manipulate resources according to the IBM Systems Director data model.
    Scalable Highly Expressive Reasoner
    A technology that provides ontology analytics (OWL-DL without nominals) over highly expressive ontologies.
    Script Monitor Extension for IBM Director
    An extension for IBM Director that makes it easy to customize Resource Monitor items with scripts.
    Scripting Tools for SAN Volume Controller
    An interface and scripting tools using Perl for automating tasks in SAN Volume Controller.
    Semantic Tools for Web Services
    A set of Eclipse plug-ins that can be installed on WebSphere Integration Developer (WID) 6.0.1 for semantic matching and composition of Web services.
    Service Integration Bus Explorer
    A stand-alone GUI tool that allows exploration and management of the messaging resources in a Service Integration Bus.
    Sparse Matrix-Vector Multiplication Toolkit for Graphics Processing Units
    A sparse matrix-vector multiplication library optimized for NVIDIA GPUs using CUDA
    Terminal Automation Tool
    A flexible, lightweight framework for automating screen-based applications.
    The IBM Distribution of Apache Hadoop
    IBM Distribution of Apache Hadoop
    Unstructured Information Management Architecture
    IBM technology that supports the implementation, composition, and deployment of UIMA applications.
    Visual Performance Analyzer
    An Eclipse-based visual performance toolkit.
    Visual XForms Designer
    A standards-based, easy-to-use Eclipse plug-in enabling the rapid development of documents with XForms mark-up using a visual user interface.
    Watson Sparse Matrix Package
    A package of libraries for solving sparse systems of linear equations on serial and parallel computers.
    Web Service Engine for Accelerating SOA System Development
    A toolkit that can transform schemas to any desired schema type; drive and stub enterprise applications; facilitate generation of highly customized data files; and reverse-engineer legacy data files.
    A tool kit that can be used for the deployment and administration of Linux clusters.
    XML Diff and Merge Tool
    A Java program that can compare or reconcile changes in an XML document.
    XML Enhancements for Java
    A set of language extensions that facilitate XML processing in Java.
    XML Forms Generator
    A standards-based, data-driven Eclipse plug-in that generates functional forms with XForms mark-up embedded within an XHTML document from a XML data instance or a WSDL document.


    No comments: