Search My Blog

Monday, March 14, 2011

IBM Research | IBM Research | SHER

SHER

SHER is an OWL reasoner that is designed to provide semantic querying of large relational datasets using OWL ontologies.




SHER Overview



SHER (Scalable Highly Expressive Reasoner) is a breakthrough technology that provides ontology analytics (OWL-DL without nominals) over highly expressive ontologies.
  • SHER does not do any inferencing on load, and hence deals better with fast changing data (the downside is of course that reasoning is performed at query time).
  • SHER can reason on ~7 million triples in seconds, and scales to datasets with 60 million triples, responding to queries in minutes. SHER has been used to semantically index 300 million triples from the medical literature.
  • SHER tolerates logical inconsistencies in the data, and can quickly point users to these inconsistencies in the data, and help the user clean up inconsistencies before issuing semantic queries.
  • SHER provides explanations (or justifications) for why a particular result set is an answer to the query. This is useful for validation by domain experts.


  • For a high level overview, here's a podcast that describes SHER, and some use cases for the technology.




    How it works


    SHER's reasoning technique relies on a novel combination of indexing the instances of the database from the perspective of reasoning. This indexing technique summarizes the instance data into a very compact representation that is used for reasoning. For details, see Summarization, ISWC 2006 and Tech report for ISWC 2006.

    SHER uses this representation to efficiently filter instance data that is irrelevant for answering a certain query, and selectively uncompresses portions of the summarized representation relevant for the query, in a process called refinement. The combination of summarization and refinement is key to SHER's scalability. For details, see Paper in AAAI 2007 and Tech report for AAAI 2007. Internally, SHER uses the popular open-source OWL-DL reasoner, Pellet, to reason over the summarized data and obtain justifications for the data inconsistency.

    SHER performs membership query answering as well as conjunctive query answering using a set of optimization techniques described in this paper on optimizing membership query answering and conjunctive querying. These optimization techniques leverage summarization in the context of conjunctive querying, and also incorporate faster incomplete reasoning techniques into query answering. SHER therefore has an internal knob which can be used to get fast, incomplete answers to queries. This faster algorithm can help retrieve large result sets for most queries within a minute or two.




    Use cases


    Automated Clinical trials matching using ontologies



    In collaboration with researchers at Columbia University Medical Center, we used SHER to find electronic patient records that match clinical trials criteria. The problem in automating clinical trials matching is that patient data is noisy, coded in local terminologies, and highly specific. Clinical trials queries, however, tend to be much more general. Bridging the gap between the two requires significant knowledge engineering, which has to be customized for each institution. For example, the patient records contain records such as Patient X was medicated with a vendor specific drug Y. Clinical trials criteria, however, are specified in terms of a broad class of drugs, such as patients that are on medications involving an active ingredient Z.

    Together with researchers from Columbia, we investigated whether it was possible to re-use the knowledge in the SNOMED ontology to bridge the gap between electronic medical records and clinical trials queries. SHER successfully found matches for the clinical trials queries on a large 1 year patient dataset from Columbia (60 million triples). For details, see ISWC 2007, Clinical trials matching paper. For a set of slides about this case study, see Clinical trials matching case study slides.

    Cleaning up text extraction output using ontologies


    SHER has been used in the context of SemanticClean, a project that examines whether it is possible to use OWL reasoning to clean up inconsistencies in data generated by text extraction. Depending on the number of inconsistent patterns present in the data, SHER can detect several thousand inconsistencies, and it takes between 10-67 minutes for dataset sizes that are between ~800K-2 million triples. For details, see ISWC 2007, paper on SemanticClean.

    Searching PubMed with ontologies on AnatomyLens


    AnatomyLens provides a semantic, concept based search over annotations of PubMed articles and GOA annotations using the GO (gene ontology) and FMA (Foundational Model of Anatomy) ontologies. Users enter anatomy terms, MeSH terms, and biological processes as search keywords. Anatomy Lens is more precise and has better recall than text search. For example, for the query Alzheimer's, brain, neuron development, Anatomy Lens will match Alzheimer's articles that discuss dendrite development in the hippocampus, whereas a standard text search will only find articles containing the queried keywords explicitly and might also find articles that are unrelated (such as articles about neuron development in the spine).


    Here's a presentation of SHER and its applications: SHERAndItsApplications.ppt



    Availability


    SHER is available at the following URL, free for academic use: Try SHER

    Platform requirements

    Operating systems: Windows® and Linux®

    Installation instructions

    Please see the included SHER-Documentation.pdf file, located in the /doc folder when you expand the tarball.

    Go there...
    http://www.alphaworks.ibm.com/tech/sher/download

    Linked Open Data and Extraction of Vocabularies from Linked Open Data

    Read More...
    http://domino.research.ibm.com/comm/research_projects.nsf/pages/iaa.index.html

    Platforms

     66 results  
    :

    Technology Date
    3D Fast Fourier Transform Library for Blue Gene/L and Blue Gene/P
    A C++ library for computing distributed complex-to-complex, three-dimensional Fast Fourier Transforms on the Blue Gene/L and Blue Gene/P supercomputer.
    04/06/2010
    Application Advancement Assistant for WebSphere Application Sever Community Edition
    A tool that enables developers of Java EE applications to migrate from IBM WebSphere Application Server, Community Edition, to the more advanced WebSphere Application Server family of products.
    10/28/2008
    Automatic Testing Toolkit for Virtualization Providers
    A smart cross-platform toolkit for developing virtualization provider test framework
    02/25/2009
    Centralized User Management for the IBM Virtualization Engine
    On-demand systems management via a single, consolidated interface, using IBM's Virtualization Engine.
    02/02/2006
    CIM Repository Synchronization for Cloud Computing
    A tool to synchronize Common Information Model (CIM) Repositories which are deployed on different servers.
    07/21/2009
    CodeRally
    A Java-based, real-time programming game based on the Eclipse platform.
    06/29/2006
    Collaborative Code Review Tool
    A collaborative code review plug-in for Eclipse.
    09/13/2010
    Compound XML Document Toolkit
    A standards-based, schema-driven toolkit for mixed-namespace XML documents.
    08/31/2006
    Custom Math Functions for High Performance Computing
    Implementations of various transcendental math functions, including "erfc", with no conditional branches.
    05/10/2006
    Data Discovery and Query Builder
    A Web-based framework for searching database records to identify and correlate data based on semantic concepts rather than specific data layouts.
    02/11/2011
    Design Pattern Toolkit
    An Eclipse-enabled template engine for generating applications based on customizable, model-driven architecture transformations.
    04/24/2007
    Dynamic Logical Partition Command Line Tool for Integrated Virtualization Manager
    A tool that enhances the usability of the Dynamic Logical PARtition (DLPAR) feature of systems which are managed with Integrated Virtualization Manager (IVM).
    02/11/2009
    Eclipse Based Foundation Toolkit for Heterogenous Database Applications
    A lightweight heterogenous database tool for developers enabled by deployment on to an existing Eclipse shell
    02/23/2009
    EMBL/FASTA Wrapper for WebSphere Information Integrator
    A tool that enables (by using SQL) access, retrieval, and federation of bio-sequences and data stored in flat-file, specialized data sources in either EMBL or FASTA format.
    02/21/2008
    Expedited Real-Time Task Graphs
    A deterministic, real-time programming model for Java with supporting tools and run-time environment.
    08/28/2007
    Expert System for Tuning Optimizations (ESTO)
    A tool that tunes the set of parameters for optimizing a specific target program.
    08/23/2007
    Flexible Internet Evaluation Report Architecture
    A highly flexible architecture for the design, display, and reporting of Internet surveys.
    03/18/2005
    Full-System Simulator for IBM PowerPC 970
    A full-system simulation infrastructure and tools for the PowerPC 970 instruction set.
    04/06/2006
    GAIAN Database
    A distributed federated database using a biologically inspired self-organization principle to minimize management.
    12/10/2010
    Graphical LPAR Monitor for System p5 Servers
    A graphical LPAR monitor for the System p5 Server that allows the status of CPU and memory resources used by one or more LPARS to be monitored side by side.
    04/05/2007
    HeapAnalyzer
    A graphical tool for discovering possible Java heap leaks.
    02/17/2011
    High Productivity Computing Systems Toolkit
    A framework and toolkit that automates the detection of bottlenecks in application performance.
    07/07/2010
    IBM Broadband Transmission-line Characterization Using Short-Pulse Propagation
    A software toolkit containing an advanced 2D field solver and signal-processing facility for extracting broadband transmission line properties.
    07/30/2010
    IBM Cluster Monitor
    A one-stop, easy-to-use, configurable, fully automated, Web-based tool for monitoring and observing cluster-wide performance in the form of visible graphs.
    05/06/2008
    IBM Dynamic Application Virtualization
    A technology that enables computationally-intensive applications to take advantage of accelerated libraries on remote, back-end systems (including Cell Broadband Engine), reducing time to deployment and disruption to business.
    04/02/2009
    IBM Electromagnetic Field Solver Suite of Tools
    A suite of full-wave and quasi-static electromagnetic field solver tools used to calculate the electrical parameters for interconnection and packaging design.
    07/30/2010
    IBM Full-System Simulator for the Cell Broadband Engine Processor
    A full-system simulation infrastructure and tools for the Cell Broadband Engine processor.
    06/01/2009
    IBM Hash Suffix Array Delta Compression
    A new differential compression algorithm that combines the hash value and suffix array techniques.
    11/27/2007
    IBM Image Construction and Composition Tool
    The IBM Image Construction and Composition Tool enables users to construct custom virtual images that they can provision with Tivoli Provisioning Manager and IBM WebSphere CloudBurst Appliance, or use in IBM Smart Business Development and Test on IBM Cloud.
    03/01/2011
    IBM Integrated Ontology Development Toolkit
    An ontology toolkit for storage, manipulation, query, and inference of ontologies and corresponding instances.
    12/07/2007
    IBM Lock Analyzer for Java
    A cross-platform tool that provides an insight into how well Java locks are performing in a live Java application.
    09/11/2007
    IBM MapReduce Tools for Eclipse
    An Eclipse plug-in that simplifies the creation and deployment of MapReduce programs.
    04/16/2007
    IBM Parallel Machine Learning Toolbox
    A toolbox for running machine learning algorithms on parallel computing platforms.
    11/27/2007
    IBM Performance Simulator for Linux on POWER
    A tool that provides users of Linux on POWER a set of performance models for IBM's POWER processors.
    07/24/2009
    IBM Real-Time Class Analysis Tool for Java
    A tool enabling the deployment of real-time Java applications without manual definition of classes to be preloaded.
    02/11/2009
    IBM Resource Monitor for BladeCenter Server Room
    An easy-to-use and flexible tool to monitor and search modules in a BladeCenter server room, to improve hardware resource utilization and asset security.
    09/09/2009
    IBM Scheduler for High Throughput Computing on IBM Blue Gene P
    A lightweight scheduler that supports high-throughput computing (HTC) applications on Blue Gene/P.
    11/26/2008
    IBM TuningFork Visualization Tool for Real-Time Systems
    An Eclipse-based visualization and performance analysis tool for real-time applications with support for Java, C++, IBM's Real-time JVM, and Linux.
    09/09/2009
    IBM Web Ontology Manager
    A Web-based system for managing Web Ontology Language (OWL) ontologies.
    04/25/2006
    IBM Workplace Server Performance and Health Monitor
    A Web application that enables users to gauge the performance and health of a Workplace server.
    06/28/2006
    IBM XL UPC Compilers
    A compiler with implementation for Unified Parallel C (UPC) High-Performance Computing (HPC) applications on large-scale, parallel processing machines.
    06/02/2010
    Open Virtualization Format Toolkit
    A composition tool to build software virtual appliances in the new standard format, Open Virtualization Format.
    06/05/2009
    OpenCL Development Kit for Linux on Power
    OpenCL - The open standard for parallel programming of heterogeneous systems
    06/30/2010
    Pmcount for Linux on Power Architecture
    A hardware performance counter tool for the IBM POWER4, POWER4+, POWER5, POWER5+, POWER6, and PowerPC 970 processors.
    09/10/2009
    Post-Link Optimization for Linux on POWER
    A post-link optimization utility for the POWER architecture that optimizes an executable program or a shared library, based on its run-time profile.
    01/13/2011
    Preservation DataStores
    An OAIS-based preservation-aware storage component that supports the future usability of digital information.
    12/22/2009
    Provider Acceptance Test Suite
    A tool that validates vendor and IBM providers before integrating them into the IBM Storage Management Product Portfolio.
    09/11/2008
    Really Small Message Broker
    A very small messaging server that uses the lightweight MQTT publish/subscribe protocol to distribute messages between applications.
    08/20/2010
    Resource Simulator for IBM Systems Director Data Model
    An extension for IBM Systems Director whose commands can simulate and manipulate resources according to the IBM Systems Director data model.
    11/25/2008
    Scalable Highly Expressive Reasoner
    A technology that provides ontology analytics (OWL-DL without nominals) over highly expressive ontologies.
    07/15/2008
    Script Monitor Extension for IBM Director
    An extension for IBM Director that makes it easy to customize Resource Monitor items with scripts.
    12/04/2007
    Scripting Tools for SAN Volume Controller
    An interface and scripting tools using Perl for automating tasks in SAN Volume Controller.
    05/04/2010
    Semantic Tools for Web Services
    A set of Eclipse plug-ins that can be installed on WebSphere Integration Developer (WID) 6.0.1 for semantic matching and composition of Web services.
    06/09/2005
    Service Integration Bus Explorer
    A stand-alone GUI tool that allows exploration and management of the messaging resources in a Service Integration Bus.
    12/21/2006
    Sparse Matrix-Vector Multiplication Toolkit for Graphics Processing Units
    A sparse matrix-vector multiplication library optimized for NVIDIA GPUs using CUDA
    04/21/2009
    Terminal Automation Tool
    A flexible, lightweight framework for automating screen-based applications.
    08/13/2009
    The IBM Distribution of Apache Hadoop
    IBM Distribution of Apache Hadoop
    06/23/2010
    Unstructured Information Management Architecture
    IBM technology that supports the implementation, composition, and deployment of UIMA applications.
    06/13/2008
    Visual Performance Analyzer
    An Eclipse-based visual performance toolkit.
    11/27/2009
    Visual XForms Designer
    A standards-based, easy-to-use Eclipse plug-in enabling the rapid development of documents with XForms mark-up using a visual user interface.
    08/31/2006
    Watson Sparse Matrix Package
    A package of libraries for solving sparse systems of linear equations on serial and parallel computers.
    01/27/2011
    Web Service Engine for Accelerating SOA System Development
    A toolkit that can transform schemas to any desired schema type; drive and stub enterprise applications; facilitate generation of highly customized data files; and reverse-engineer legacy data files.
    02/05/2010
    xCAT
    A tool kit that can be used for the deployment and administration of Linux clusters.
    04/28/2006
    XML Diff and Merge Tool
    A Java program that can compare or reconcile changes in an XML document.
    03/27/2001
    XML Enhancements for Java
    A set of language extensions that facilitate XML processing in Java.
    11/22/2006
    XML Forms Generator
    A standards-based, data-driven Eclipse plug-in that generates functional forms with XForms mark-up embedded within an XHTML document from a XML data instance or a WSDL document.
    01/08/2009

    Don


    No comments: