PORTAL-DOORS Project NPDS Cyberinfrastructure System Design
For comprehensive reports on NPDS, please refer to DOORS to the Semantic Web and Grid With a PORTAL for Biomedical Computing published online 3 August 2007 in the journal IEEE Transactions on Information Technology in Biomedicine and A Distributed Infrastructure for Metadata about Metadata: The HDMM Architectural Style and PORTAL-DOORS System published online 1 June 2010 in the journal Future Internet. More recent information can be found in the documents available at PDP Papers. These reports contain explanations of the principles, concepts, models, schemas, URL and querystring designs used by PDP for NPDS.
- Abstract from 2006 Blueprint Paper for the Nexus-PORTAL-DOORS System (NPDS)
- Hierarchically Distributed Mobile Metadata (HDMM) as an Architectural Style
- Architectural Design of NPDS Cyberinfrastructure
- Cyberinfrastructure System versus Tools and Applications versus Content
- General Usage Scenarios for NPDS
- Specific Use Cases for NPDS
- Development Roadmap for NPDS
See also other NPDS related papers and presentations.
Abstract: The semantic web remains in the early stages of development. It has not yet achieved the goals envisioned by its founders as a pervasive web of distributed knowledge and intelligence. Success will be attained when a dynamic synergism can be created between people and a sufficient number of infrastructure systems and tools for the semantic web in analogy with those for the original web. The domain name system (DNS), web browsers, and the benefits of publishing web pages motivated many people to register domain names and publish web sites on the original web. An analogous resource label system, semantic search applications, and the benefits of collaborative semantic networks will motivate people to register resource labels and publish resource descriptions on the semantic web. The Domain Ontology Oriented Resource System (DOORS) and Problem Oriented Registry of Tags And Labels (PORTAL) are proposed as infrastructure systems for resource metadata within a paradigm that can serve as a bridge between the original web and the semantic web. The Internet Registry Information Service (IRIS) registers domain names while the Domain Name System (DNS) publishes domain addresses with mapping of names to addresses for the original web. Analogously, PORTAL registers resource labels and tags while DOORS publishes resource locations and descriptions with mapping of labels to locations for the semantic web. Beacon is proposed as a prototype PORTAL registry specific for the problem domain of biomedical computing.
Citation: Carl Taswell, 2008, IEEE Transactions on Information Technology in Biomedicine Vol 12 No 2 pages 191-204; manuscript received 31 October 2006, published online 3 August 2007.
IRIS registries and DNS directories provide the model for the architectural style that inspired the design of PORTAL registries and DOORS directories. The most essential characteristics of this architectural style can be summarized by the following principles:
- Pervasively distributed and shared infrastructure, content, and control of content (including distributed and shared control over both the contribution and distribution of the content).
- A hierarchy of both authoritative and non-authoritative servers (root, primary, secondary, forwarding and caching servers, etc) enabling global interoperable communication while permitting local control of policies.
- A separation of concerns with registries for identification and directories for location.
- A freedom of choice in the selection of identifiers with purposeful absence of any requirement to use the same top-level root "name" or "label" for all identifiers, thus enabling essentially unrestricted choice of naming or labeling schemes for identification and avoiding monopolistic control by any single organization.
- A focus on moving the metadata for "who what where" as fast as possible from servers in response to requests from clients (that access non-authoritative local forwarding and caching servers updated regularly by the authoritative servers).
Users of today's web browsers may not be familiar with the engineering of the hidden infrastructure system that enables them to navigate to any web site around the world. But it is the IRIS-DNS infrastructure system, which is responsible for registering domain names and mapping them to numerical IP addresses, that makes it possible for the user to browse the web in such an effortless manner almost always without ever typing or even seeing the numerical IP addresses.
Moreover, from the user's perspective, what is most important now is that the speed of this conversion from domain name to IP address occurs so rapidly that the user does not experience it as a hindrance or delay in browsing. Even if the particular web page itself downloads and displays slowly, usually at least the web site address is found quickly. And that happens because the small amount of metadata (domain name and IP address) moves so quickly across the internet even if the larger amount of data (web page text and media) does not. Because of this important point, the phrase Hierarchically Distributed Mobile Metadata (HDMM) is introduced (9 May 2009) here as a name for this architectural style that characterizes both IRIS-DNS and PORTAL-DOORS.
Whereas IRIS-DNS implements the HDMM architectural style for the original web, PORTAL-DOORS extends and implements this style for the semantic web and grid. Further, PORTAL-DOORS enhances the separation of concerns principle to include the additional notion of separately optimising directories for semantic services (with use of the RDF/OWL/SPARQL stack of technologies) and the registries for lexical services (with use of character string processing and only those XML technologies that do not require use of RDF triples). This separation of concerns enables the back-end use of traditional relational database stores for PORTAL registries and RDF-triple database stores for DOORS directories. Of course, hybrid stores can be used for both PORTAL and DOORS.
In accordance with the HDMM architectural style, PORTAL-DOORS has been designed to serve the semantic web and grid in a manner analogous to the way that IRIS-DNS has served the original web. The blueprint paper (see abstract) specifying the original design for PORTAL-DOORS was submitted to IEEE in 2006 by Carl Taswell, published online in 2007 at www.IEEE.org, and appeared in print in IEEE Transactions on Information Technology in Biomedicine 2008 Vol 12 No 2 pages 191-204. The figures and tables below have been adapted from the original paper and updated with revisions. Note that the original separate design of PORTAL registries and DOORS directories has been supplemented with a new bootstrapping combined design with integrated NEXUS registrars. Both can coexist together.
Resource metadata is registered and published by agents for search by users in the PORTAL-DOORS server networks. Semantic services here are defined as those using the RDF/OWL/SPARQL stack of technologies, whereas lexical services are defined as those using only character string processing. Fields within data records are considered required or permitted with respect to the schemas maintained by the root servers. The figure above displays only the most important fields; for all fields, see reference model implemented with XML Schema.
Resource metadata server networks for PORTAL registering of labels and tags and DOORS publishing of locations and descriptions are analogous to domain metadata server networks for IRIS registering of names and DNS publishing of addresses. Primary PORTAL registries may be established by any individual or organization with or without any local policies governing registration of resources. Examples shown here (GeneScene, BrainWatch, ManRay) implement policies with a problem-oriented focus on their respective specialty domains. Specific criteria for registration are determined by the local schema of the PORTAL primary which must nevertheless comply with the global requirements of the PORTAL root.
Return to Page Contents
PORTAL-DOORS as a lower-level infrastructure system must be distinguished from higher-level tools and applications built on the foundation of the infrastructure. PORTAL-DOORS as a metadata management, communication, and distribution system must also be distinguished from the actual metadata that the infrastructure is designed to send, receive, and exchange throughout the system. Fundamentally, the PORTAL-DOORS System establishes an interoperable, platform-independent, application-independent, interface standard for information exchange over the internet with a design that is guided by the HDMM architectural style, specified to fulfill additional requirements to serve both the original web and semantic web as described in the PORTAL-DOORS blueprint paper, and currently partially detailed in a draft reference implementation written in XML Schema *.xsd files.
Work to complete a reference implementation must clarify not only the structural data model for metadata records, but also the functional behavioral model for the PORTAL and DOORS services in response to requests from clients. Servers and clients must also communicate over transport protocols. The PORTAL-DOORS Project maintains a vision of serving more than one transport protocol as discussed in Section VII.E. of the PORTAL-DOORS blueprint paper. Initial drafts (prior to version 0.5) assumed use of the IRIS core protocol. The current draft (version 0.5) addresses only the structural data model. The next draft (version 0.6) will re-introduce use of a specific transport protocol but replace the IRIS core protocol with an http protocol using a RESTful web services model. At present, in a bootstrapping stage of development for PORTAL-DOORS, web services do provide a more favorable environment for spreading adoption of the system. However, a fully dedicated and optimized protocol specifically for PORTAL-DOORS may ultimately prove necessary to achieve the speed and efficiency comparable to that which exists now for IRIS-DNS.
As the PORTAL-DOORS System continues to be developed and implemented, any tool, application, or web site that accesses the PORTAL-DOORS System must be distinguished from the system itself. The PORTAL-DOORS System should not be considered either a single site or repository any more than the IRIS-DNS System of domain name registries and directories could be construed to be a single site or repository. For both IRIS-DNS and PORTAL-DOORS infrastructure systems, server data stores and client tools and applications can be written in any language on any platform. Client tools are necessary for agents to edit the information maintained at an individual server data store. Client tools are also necessary for agents and users to navigate, search and query the information stored not only at a particular server but also throughout the entire network of servers. These tools include faceted browsers, keyword search utilities, and SPARQL query interfaces.
Even more complex applications can be built in which the navigation, search, and query tools may be embedded within more sophisticated applications that hide these tools from the user interface. An important example would be an application component that provides natural language answers to natural language questions in the context of the overall function of the software application. In this example, the component converts the user's natural language question to a SPARQL query submitted to the PORTAL-DOORS System, and then converts the response from the PORTAL-DOORS System to a natural language answer for presentation to the user.
Return to Page Contents
PORTAL-DOORS has been designed to be as flexible as possible with both backward and forward compatibility from Web 1.0 to Web 3.0. Given the partition with non-semantic services on the PORTAL side and semantic services (with the RDF/OWL/SPARQL stack) on the DOORS side, and also the partition with both required and permitted elements for each of PORTAL and DOORS, there are many possible scenarios for usage of the entire PORTAL-DOORS System. Some examples include:
- Minimal use of required elements for both PORTAL and DOORS: This scenario essentially reduces use of the system to an alternative equivalent to PURLS and other similar services.
- Maximal use of permitted elements for PORTAL but minimal use of required elements for DOORS: This scenario enables exploiting the full metadata management facilities of the PORTAL non-semantic services (which include provisions for tags, micro-formats, cross-references, etc) without any obligation to use the DOORS semantic services (that necessitate use of the RDF/OWL/SPARQL stack of technologies and tools). This scenario enables resource agents to publish metadata now in non-semantic formats and defer until later any possible transition to semantic formats which would then be facilitated by the prior staging in the non-semantic formats.
- Minimal use of required elements for PORTAL but maximal use of permitted elements for DOORS: This scenario serves those situations where there is no barrier to transition the metadata from original web formats to semantic web formats, and the resource owner and agent do not wish to maintain the metadata in both semantic and non-semantic formats. This scenario requires that the resource agent registering and publishing the metadata already has access to established ontologies that can be referenced by semantic tools for describing the resource.
- Maximal use of permitted elements for both PORTAL and DOORS: This usage scenario provides the significant benefit of exposing as much metadata as possible to as many clients as possible including both older non-semantic as well as newer semantic tools and applications.
The original PORTAL-DOORS blueprint paper discussed the following use cases:
- Assisting with organization of the "bioinformatics resourceome" and the description, discovery and use of resources for e-science, e-medicine, and e-business in health care and life sciences (see Section III).
- Cataloguing resources for biomedical computing (see Sections IV and VIII).
- Cataloguing patents and trademarks and relating them to products and services (see Section IX).
- Assisting with semantic search, decision support and knowledge management applications in translational research and drug discovery for personalized medicine (see Section XI).
More detailed descriptions of examples in the context of translational research include the following use cases of PORTAL-DOORS as an information-seeking support system for:
- Pharmacogenomic molecular imaging (see AMIA STB 2008 poster).
- PET and SPECT brain imaging (see W3C HCLS F2F 2009 slides).
Although originally conceived and described in the context of health care and life sciences, the diversity of possible use cases for PORTAL-DOORS remains as universal as the diversity of possible use cases for IRIS-DNS.
The PORTAL-DOORS Project maintains the following goals:
- Development of a complete specification model for the PORTAL-DOORS System.
- Development of a reference implementation with XML Schemas for the interoperable communication interfaces.
- Development of the software necessary for simple functional servers and clients to demonstrate the system.
- Demonstration of the system for the use cases of pharmacogenomic molecular imaging and brain imaging with the support of the prototype registries (BrainWatch, ManRay, GeneScene, etc) for the problem-oriented specialty domains relevant to these use cases.
Currently, development plans envision following a roadmap with these milestones:
- Version 0.5: Current live implementation with back-end database and front-end web browser client for partial PORTAL functionality and partial DOORS functionality
- Version 0.6: Implementation as RESTful web services with both ASP.Net desktop and browser clients
- Version 0.7: Implementation of servers and clients for JAVA based environments
- Version 0.8: Completion and revision of lexical PORTAL functionality including terminology tools
- Version 0.9: Completion and revision of semantic DOORS functionality including ontology tools
- Version 1.0: Release of PORTAL-DOORS System model and schemas for an authoritative server at a single site
- Version 2.0: Multi-site functionality including security for distributed interacting authoritative servers
- Version 3.0: Multi-site functionality including provenance for distributed interacting non-authoritative servers operating with request forwarding and response caching amongst the distributed servers
Return to Page Contents