Towards Universal Serial Item Names

Towards Universal Serial Item Names


Robert D. Cameron
School of Computing Science, Simon Fraser University,
Burnaby, B.C., Canada V5A 1S6
Contact for correspondence: cameron@cs.sfu.ca

Abstract

The Universal Serial Item Name (USIN) scheme is proposed as a framework for a single global namespace of articles and other contributions published in organized serial collections. Requirements for USINs are analysed with an emphasis on the use of USINs in scholarly communication. A uniform naming model is described based on the hierarchical naming of serial publications and the hierarchical numbering of serial items. A number of concrete design ideas for USIN syntax are presented. A USIN Global Registry and a USIN Global Database are proposed and analysed in terms of specific architectural features that interact to meet the requirements of publishers, librarians and scholars. Applications of the USIN concept to literature research, document retrieval, bibliography preparation and addressing the 'broken links' problem of the World Wide Web are considered.

1 Introduction

The Universal Serial Item Name (USIN) scheme is proposed as a framework for a single global namespace of articles and other contributions published in organized serial collections. Although the initial focus is scholarly literature published in journals, conference proceedings, technical reports and books, the scheme is intended to accommodate extensions to include other types of serialized contributions such as magazine articles, bills of a legislature, decisions of a court or minutes of university committee meetings. The USIN is intended as a vehicle for interoperability between various bibliographic citation applications, including finding citations (literature research), retrieving citations (from online sources, libraries or document delivery services), citation indexing, and citation formatting (bibliography preparation). The USIN is also intended as one possible mechanism for migrating the World Wide Web away from dependence on Uniform Resource Locators (URLs) (Berners-Lee et al. 1994) to a system meeting the requirements for Uniform Resource Names (URNs) (Sollins and Masinter 1994).

The USIN concept is related to the Serial Item and Contribution Identifier (SICI) (NISO 1996), the Publisher Item Identifier (PII) (Anon 1997), and the Digital Object Identifier (DOI 1997) schemes. However, the USIN approach is primarily concerned with the task of document identification in human communication, particularly scholarly, technical and legal communication, whereas the other schemes are more concerned with document delivery, library processing and publisher perspectives. In particular, the USIN should use mnemonic coding and be reproducible by ordinarily literate people (authors, students, librarians, law clerks, and so on) without the need for specialized coding knowledge and check-sum algorithms. The USIN system is also intended for serialized material that is not or cannot be registered with an International Standard Serial Number (ISSN); both SICI and PII rely on ISSNs for serial item identification. Philosophically, the USIN concept is most closely related to the SICI scheme in that they each identify documents with their publication in a particular organized series. The PII and DOI schemes identify documents as items owned by publishers, with numbers possibly assignable in advance of publication and independent of publication numbering. Green and Bide (1997) and Paskin (1997) provide good overviews of the various current approaches to identification of published articles or other items.

Central to the USIN concept is the notion of publication in an organized serial collection. This is a generalization of the traditional notion of a serial publication. An organized serial collection is defined to be any series of items published with a specific publication numbering framework. A (volume, issue, page) numbering framework might be used for a particular journal. The framework may change over time (e.g. changes in the number of issues per volume), but the numbering for any particular item is set when it is published. Both explicit and implicit elements may be used in the numbering framework, so long as they are fixed at the time of publication. For example, numbering of articles may be by explicit (volume, issue, page) numbering, with a counting rule based on page layout to distinguish multiple articles on a page. The authority for number assignment is usually, but not always, the publisher. For example, ISBN numbering of books satisfies the USIN definition of publication numbering framework and so allows the USIN scheme to be applied to books as well as to conventional serials.

In application to scholarly writing and bibliography preparation, the USIN concept is envisioned to be used with bibliographic processing 'plug-ins' to standard word-processing software. These plug-ins should be capable of resolving USIN references into appropriately formatted citations consistent with chosen style guidelines. USIN resolution may be achieved through locally-mounted databases coupled with World Wide Web access as a backup. Authors could thus use USINs as citation tags for papers of interest, much as they use similar tags with BibTeX, ProCite, EndNote or other bibliographic formatting tools. However, with the USIN approach, authors will be spared the drudgery of creating their own bibliographic databases for use with these products, editors will be spared the task of correcting author errors in citations, and readers will be spared the difficulty of resolving errors in citations that authors and editors miss.

In application to literature databases, the USIN can serve as a standard notation to report the results of a search process. This could open up new opportunities for combining search results from distinct databases. For example, duplications could be filtered by USIN matching, or relevant items from one search might be fed back into a search on a different database. In fact, the USIN idea is intended to serve as the core data element in a scheme for universal citation databases: databases that link every document to the documents it cites and vice versa (Cameron 1997).

In application to the Web, the USIN concept has considerable promise as a potential partial solution to the problem of 'broken links' (Cameron 1994; Fielding 1994) . In short, the URLs that are currently used for hypertext links on the Web are based on 'locations' that specify documents in terms of access protocols, port numbers, directory paths, and filenames. For various reasons, all of these attributes of document location are subject to change and Web links frequently become broken as a result. Many proposals to resolve this problem through the creation of some form of Uniform Resource Name have been put forward, but none seem to have progressed beyond the experimental stage (Daniel 1997; Daniel and Mealling 1997).

In comparison to the URN approach, the USIN scheme concentrates on the somewhat smaller problem of establishing a universal naming scheme for publications in serialized collections only. One could imagine that USINs could be developed within the overall URN structure as one particular 'namespace' (Moats 1997). On the other hand, there are several reasons why it may be best to focus on a specific solution for USINs instead of the general URN problem. First, it could be argued that the best focus for perpetual naming schemes is to concentrate on those items actually intended to be long-term contributions to the global knowledge archive. From this perspective, publication in an organized serial collection may be the best single indication of such an intent. Second, the act of assigning a document a number within a serial collection represents an important technical opportunity unavailable for general Web resources; a specific event in the publication process to which naming scheme protocols can be tied. Third, focussing on the evolving global knowledge archive as a development from the present international network of libraries may suggest different approaches to identifying the 'resolution service' for a USIN. For example, users could be allowed to choose their own resolution service from those offered by different local libraries, instead of being forced to accept a network-specified service. In the terminology of the Dexter Hypertext Reference Model (Halasz and Schwartz 1994), we can take advantage of the flexibilities afforded by resolution within the run-time layer to overcome difficulties in storage-layer resolution. For all these reasons, focussing on publications in organized serial collections may be both the right problem to solve and the one for which URN solutions are most feasible.

Applications of the USIN scheme to other areas such as legal citation and legal research are also envisaged. However, these are at present beyond the scope of this paper and are left as an area for future consideration.

This paper is intended as a discussion document to set the framework for development of the USIN concept. Overall, the goal is to propose the requirements that must be met by any USIN system, and to suggest some reasonably concrete design ideas that meet those requirements. Section 2 focusses on the requirements analysis with a particular emphasis on the concept of scholar-friendly naming. Sections 3 and 4 focus on design concepts that satisfy the USIN requirements, broken down into two main tasks: globally unique naming of serial publications and hierarchical identification of serial items within a particular publication series. Requirements for important USIN support technologies are discussed in section 5.

2 Requirements analysis

The goal of this section is to discuss the general requirements that any USIN system must meet, without making premature commitments to particular USIN design ideas. At the same time, the requirements are used to analyse some of the inadequacies of the existing identification standards, primarily SICI and ISSN. This serves both to help establish the need for a new identification scheme and to bring some concreteness to the discussion. The reader who prefers additional concreteness may wish to look ahead to some example design ideas for journal article citation in Section 4.

2.1 Requirement #1: unambiguous article identification

It may seem obvious that a USIN scheme must meet the basic goal of unambiguous article identification: every article must be denotable and every USIN denoting an article must denote no other article. However, there are difficulties in achieving this goal and the goal is not achieved by the existing SICI coding scheme. In essence, the SICI scheme is prone to failure in some rare cases involving articles appearing on the same page and having similarly abbreviated titles. To deal with the multiple-article-per-page problem, SICI uses a 'title code' of up to six characters, usually formed from the initial letters of title words. Different articles on a page can usually be distinguished by this title abbreviation. In principle, however, it is possible to have two or more articles with the same SICI title abbreviation and hence the same overall SICI code. Presumably this is one of the reasons for the 12 ambiguities reported within 4 million SICI strings stored in the Uncover database (Schwarz and Hepfer 1996). Another problem with SICI serial title abbreviation is that it requires human judgment when the title contains symbology; this is a further possible source of ambiguity.

To ensure that every article is denotable, a logical first step is to ensure that every serial is denotable. Unfortunately, the existing international standard in serial identification, ISSN, has an insufficiently large denotation space. The ISSN system is based on an eight-digit identifier with seven working digits and a check digit. The upper limit on the number of serials that can be accommodated is therefore 10 million. When contemplating a universal designation scheme for serial items as fine-grained as the minutes of curriculum committee meetings of a particular university department, it should be become clear that the ISSN system as presently constituted will not suffice.

2.2 Requirement #2: canonical USINs

Although every USIN must denote at most one article, it is reasonable to allow different USINs to denote the same article. For example, issue numbers may be an optional part of the USIN syntax, required only when journals are paginated by issue. In the case that journals are paginated by volume, it could be desirable to allow either form (with or without issue numbers) as an acceptable USIN form. There are many other reasons that alternative forms of a USIN might be desirable and there is no particular reason to rule this option out in the initial requirement specification for USINs.

Nevertheless, of the set of USINs that may legally denote an article, exactly one of them should be specified as the canonical or preferred form. One use for canonical forms is to make it easy to determine whether two different USINs denote the same article: convert them both to canonical form and see if they are the same. For example, if a user searches two distinct databases for articles of interest on a particular topic and both databases return USINs in canonical form, then it is an easy matter to filter out duplicate references to the same article because they are represented by exactly the same string. A second important role for canonical forms is to support indexing of information by USIN. By always associating information with the canonical form of a USIN, it will be possible to retrieve that information given any legal USIN form by first converting to the canonical form.

A further requirement for USINs is that conversion to canonical form be an algorithmic process based on globally available information. In this way, separate software systems will be able to interoperate by conversion to the common canonical form. The requirement for globally-available information is not particularly a restriction on the syntax of USINs, but is a constraint on the implementation of the overall USIN system and how the basic information on USINs and their formation on a serial-by-serial basis must be shared.

2.3 Requirement #3: identification of secondary serial components

Although the primary focus of the USIN concept is on the identification of published articles, there are a number of other related elements worthy of identification at both coarser and finer granularities. On the coarser side, this includes identification of the serial itself, volumes or volume ranges of serials, an index to the volume, individual issues and issue ranges, contents of an issue or special sections of an issue. At a finer level of granularity, it may include named or numbered components of articles, such as article abstracts or individual sections, figures, tables, or equations. Scholars may sometimes want to make reference to these components; other applications include identifying library holdings on a volume/issue basis, checking in serial issues when they arrive at the library or submitting claims for them when they are late, and ordering table of contents pages for awareness services. The SICI scheme includes capabilities for designating some of these components through its code structure identifier (CSI) and derivative part identifier (DPI); the PII and DOI schemes do not appear to account for such components. ANSI Serials Holdings Statements, used to identify holdings in library catalogues, includes a variety of conventions for specifying volumes, issues, ranges of volumes and issues and similar units of collection (ANS 1986).

It is not possible nor desirable to define a priori the specific set of secondary serial components that are identifiable in the USIN syntax. Instead the requirement presented here is that the USIN scheme should accommodate specification of these elements through an extensible syntax that can be coupled with a specification of what elements exist on a serial-by-serial basis.

2.4 Requirement #4: scholar-friendliness

A key requirement central to the entire focus of the USIN concept is that it emphasizes the needs of the people who use USINs over the needs of computers that process them. This encompasses many aspects that can be generally grouped under the term 'scholar-friendliness'. However, this term is not intended to restrict the set of people whose requirements are considered. Instead, it reflects the notion that anyone who uses a USIN to cite prior works may be said to be taking on the role of a scholar in that act.

One might consider that there is a middle ground between accommodating the needs of scholars and the needs of computer systems. However, the goal of establishing USINs as names that will serve to denote published items over the long term should be considered. From this viewpoint, apparent requirements that might derive from the limitations of present-day computer systems (e.g. fixed-length fields, limited storage capacity, etc.) should be avoided. There is little doubt that the processing and storage capabilities of the computer systems that will be available in coming decades will be vastly superior to those of their present-day counterparts.

Nevertheless, scholar-friendliness cannot be considered an absolute requirement at the possible expense of unambiguous article identification, canonical forms or other requirements. Instead, scholar-friendliness should be considered as a desirable trait to be maximized subject to the constraints imposed by other requirements.

2.4.1 Requirement #4.1: no required redundancy

Scholars will often need to write down USINs of interest or type them into their computers. To minimize the tedium and the chance of error in these manual processes, USINs should be designed to include only that information necessary to clearly identify the cited work. Redundant forms that include additional information may be allowed but must not be required. For example, for a journal that is paginated by volume and that follows the convention of beginning each article on a new page, it is sufficient to specify the journal, volume number and initial page number to uniquely identify an article. In this case, a USIN specification must not require the inclusion of additional information such as issue number, date or complete page range.

One counter-argument is that redundant information helps prevent errors, but one can in turn counter that this approach to error control is obsolescent and inferior. Historically, the requirement for redundant information at data-entry time is designed to allow error detection at some future processing step. This is the basis for three forms of redundancy in the existing SICI scheme for article identification: chronology (date of publication), title codes and check digits. However, these devices provide error detection without error correction. When an error is encountered, there may be a considerable delay (e.g. days in inter-library loan applications) before the error can be corrected and processing resumed. Consider instead an interactive process supported by a global network. When a scholar enters a USIN, interactive software could immediately consult the global USIN database to verify its correctness and to allow any necessary corrections or resolutions of ambiguity. One existing model for this is the immediate feedback one receives when entering an incorrect URL on the Web. In this way, an interactive data entry process can both avoid the tedium of redundant data entry and support a process of immediate error correction as well as detection. Construction of such a global USIN system is probably feasible using the present-day technology of Internet-connected computers; if not, it will certainly become feasible within a small number of years.

2.4.2 Requirement #4.2: standard mnemonics

A second requirement deriving from scholar-friendliness is to emphasize the use of mnemonic forms for identifying serial publications, and whenever possible, the standard mnemonic forms that are actually used by the community of scholars that use a particular serial. For example, the journal ACM Transactions on Programming Languages and Systems published by the Association for Computing Machinery is widely known by the acronym TOPLAS. An acceptable mnemonic form for identifying this serial might thus be S.ACM/TOPLAS where S might denote a global domain of scholarly societies and ACM is a unique code for the Association for Computing Machinery within that domain. As a second example, a designation such as CA.SFU.CMPT/TR might be acceptable as a globally unique mnemonic code for the Technical Report series of the School of Computing Science at Simon Fraser University. This code is mnemonic and builds on several accepted and standard abbrevations: CA as the ISO country code for Canada, SFU as a unique institutional code for Simon Fraser University in the CA domain (cf. the Internet DNS name sfu.ca), CMPT as the standard four-letter department ID used by Simon Fraser University for the School of Computing Science and TR as the abbreviation for Technical Report, as used by the School. The syntax shown in these examples is intended to be illustrative of a possible realization of this USIN requirement, but not prescriptive.

Incorporation of existing standard mnemonic codes within USIN designations will assist scholars in a number of ways:

  • USINs will be reported to scholars as the results of bibliographic search processes;
  • scholars will enter USINs when doing citation searches;
  • scholars will use USINs when including references in papers;
  • and scholars will make note of USINs when they find papers of interest.

In all these applications, scholars will find mnemonic forms easier to read, easier to reproduce and generally more useful. However, note that these requirements are met if the mnemonic forms are acceptable as one of the alternative forms of USIN input and are produced during output by any USIN-generating software. More precisely, the requirement for using mnemonic forms applies to the definition of the canonical form of USINs, but does not preclude alternative non-mnemonic forms.

Adopting scholar-friendly mnemonic identification necessarily imposes a further limit on the role of ISSNs within the USIN scheme. Where a serial is unambiguously known by a mnemonic form, that form must be used as canonical in place of the ISSN. Nevertheless, ISSNs are likely to have an important role both in identifying serials for which no mnemonic abbreviation has been defined and for initially identifying serials before their mnemonic identifications have been registered and accepted as globally unique.

2.4.3 Requirement #4.3: publication numbering

A further requirement deriving from the general principle of scholar-friendliness is that existing publication numbering conventions should be employed or adapted wherever possible to identify published articles within a particular serial. For example, articles in traditional print journals will typically be identified by volume number, issue number (if required) and page number, with the possible addition of a code to discriminate multiple articles on a single page. This will be of the greatest assistance to scholars when forming USINs from either copies of the article in question or from a citation of the article in a reference list. It will also be helpful to scholars in decoding USINs and retrieving the items from (physical or virtual) library shelves.

The requirement for the use of publication numbering rules out the article identification mechanisms contemplated by the PII and DOI schemes as a basis for canonical USINs. Both of those schemes emphasize publisher-generated numbers that may be different from the actual numbering on the published serial. This requirement also rules out other reasonable schemes for unambiguous article identification. For example, a scheme based on volume number and sequential article number would be widely applicable as an unambiguous numbering scheme for many journals. But scholars may be unable to easily determine the sequential article number from either a printed copy of the article or a conventional bibliographic citation. If publication numbering exists, it should be used.

One might argue that identification by publication numbering is less scholar-friendly than identification using more mnemonic article attributes, such as author name and key title words. However, this is an instance in which scholar-friendliness should not be considered an absolute at the expense of a system of unambiguous article identification.

One might also prefer to use publication chronology (e.g. dates, month-year combinations) instead of publication numbering. In fact, chronology is a form of numbering that happens also to be correlated with the passage of time. For some types of publication, chronology may be the only numbering that exists and hence must be used. In other cases, acceptable alternative USIN forms may be defined based on chronology. However, chronology is generally more complex and involves more identification pitfalls. For example, if (volume, page) identification generally suffices for article identification in a particular journal, it may be the case that (year, page) identification is inadequate for at least two reasons. First, the journal may publish multiple volumes per year. Second, even if volumes are annual, they may not correspond to calendar years; articles with the same starting page number in two consecutive volumes could still end up being published in the same year. In other cases, serial items may have duplicated and hence ambiguous chronology, for example, when two technical reports are issued on the same date. There are also a number of annoying coding problems for chronology. If numeric codes are used for months, how do you code for month combinations or seasons? If non-numeric coding is used should it be in English or the original language and should abbreviations be used? For all these reasons of potential ambiguity and complexity, identification by simple publication numbering should be used in preference to chronology.

2.4.4 Requirement #4.4: standard numbering syntax

Technical reports, government publications, court documents and journal papers have various numbering schemes, so alternative syntactic conventions for each type of publication will likely be necessary. In principle, each serial should be accompanied by a definition of its numbering scheme, including syntax and semantics of the USIN designations. However, to ease the burden on scholars, efforts should be made to limit the syntactic variations wherever possible. Thus, there should also be methods for defining standardized numbering schemes, with the goal that the vast majority of serials will use one of the standard schemes rather than one of their own design.

2.4.5 Requirement #4.5: brevity of article identification

From the scholar's point of view, the primary role and need for USINs is in identification of articles. Identifying secondary serial components (volumes, issues, special sections, abstracts, etc.) is a secondary issue of considerably less importance. The requirement for scholar-friendliness, then, is that the syntax for article identification not be complicated by codes to distinguish articles from other types of component. Instead, where necessary, the syntax for secondary components should include additional coding to indicate that a secondary component is being identified; the absence of such coding should be taken to indicate an article identification.

2.4.6 Requirement #4.6: ease of construction and analysis

It should be easy for scholars to construct and analyse USINs manually. Checksums and other calculations should be avoided. Appropriate punctuation should be used to avoid running numeric items together. For example, the code 20000229 used as the SICI specification for February 29, 2000 violates this requirement. Arcane numeric codes should also be avoided. Although numeric month codes 1 through 12 are arguably acceptable, the SICI code 23 meaning 'Fall' is not.

2.4.7 Requirement #4.7: media independent specification

It is not uncommon to find a particular serial published in two or more formats, for example, in HTML format on the Web and on paper. From the scholar's viewpoint, it is usually the case that it is the content of the article, not the form of its presentation, that matters. When there is no difference in content, the USIN specification for articles should fundamentally be independent of publication medium. This requirement does not preclude media specification from inclusion as an optional element in a USIN syntax. However, the SICI convention of including the medium format identifier (MFI) as standard practice would not satisfy the requirement for USINs.

It may be the case that a publisher creates separate designations for different formats of a serial, particularly when there may be significant differences in content. In this case, the publication medium or format may be implicitly identified by the choice of publication series designation. However, this does not represent a violation of format independence of the USIN syntax itself.

2.4.8 Requirement #4.8: embedding USINs in context

Scholars will need to make use of USINs as notational elements in a variety of contexts, both formal and informal. Formal contexts include use of USINs as citation tags for bibliographic formatting software and data elements for bibliographic database queries. Informal contexts are generally oriented to the human reader, such as presentation of USINs in reference lists or direct use of USINs as nouns in sentences. In any of these contexts, there is a potential for confusion to be created by interaction of the syntax of the USIN with the notational conventions of its embedding.

The syntax of USINs should be designed to avoid confusion that can be created by common notational features that may be expected in typical embeddings. In particular, both formal and informal settings may embed USINs as notational elements within structures delimited by parentheses, braces or similar bracketing structures. To avoid confusion, USIN syntax should be constrained to allow bracketing symbols only if they occur in matched pairs. For example, if a USIN X is to be acceptable as a parameter in a BibTeX citation tag of the form \cite{X}, then any unmatched braces within X would surely cause confusion. It may be worthwhile to avoid braces altogether because of their use in the TeX family of document languages and similarly to avoid angle brackets ('<' and '>') because of their use in HTML and SGML.

When USINs are used as elements in ordinary discourse, they may often occur at the end of a sentence or phrase. Punctuation (periods, commas, semicolons, and so on) added at this point should not be a source of confusion. The presence or absence of white space (blanks, tabs or line breaks) after such a punctuation symbol may be used to discriminate, that is, a period, comma or other punctuation may be used within the USIN syntax only if it is immediately followed by a non-blank character. Any of these punctuation marks followed by white space should always denote the end of a sentence or phrase.

2.5 Requirement #5: permanence of USIN designation

A necessary requirement for the USIN system is that USINs, once assigned and validated, remain permanently unambiguous identifiers of their documents. This applies to both canonical and non-canonical USINs. In 300 years a scholar may come across a USIN designation in an obsolete form of print media. The user may highlight it with a data-capture pen and expect to see instantly the resolution of it to a full bibliographic reference on the user's electronic work area. This requirement implies the need for a global registry system and a set of protocols for ensuring that USINs, once assigned, are never reused.

It need not be required that canonical USINs always remain canonical, however, at least in the initial development of the USIN system. Initially, the canonical USIN forms for many serials will include serial designation by ISSN. As globally unique mnemonic designations for these serials are gradually registered and accepted, those forms may become canonical. It may also be the case that changes in the canonical form of serial numbering become desirable, particularly for those aspects of numbering that are not directly reflected in publication numbering (for example, position of an article on a page).

It may be useful to impose constraints on how frequently canonical forms may be varied and/or on how results of USIN processing may be combined. For example, new canonical forms might be allowed to be registered at any time, but taking effect only at certain designated times. When such a time is reached, an updating process might:

  • temporarily disallow new USIN processing requests,
  • allow current requests to complete or time out,
  • perform global updating of canonical form information,
  • and, allow USIN processing requests to resume.
Any application that needs to ensure the completeness of USIN matching could use the simple device of requiring that all USIN processing requests are initiated and completed in the same time frame.

2.6 Requirement #6: accommodating serial evolution

Serials evolve. Changes in title, publisher or publication frequency are common. Serials may merge or split. Serials may suspend publication and then resume at a later date. Serial publishers also change: renaming, relocation, reorganization, and so on. There is no doubt that accommodation of change must be an important design goal for USIN development.

Two issues involving particular forms of change deserve special attention in the development of USIN syntax. The first is that title changes should not necessarily require changes in the USIN code for a serial. This is at odds with the ISSN convention, which requires new ISSNs to be issued when there is any significant change in title. However, in considering mnemonic abbreviations of serial titles, various changes in title may be accommodated with the same mnemonic. If the publisher and readers of a journal wish to retain a particular mnemonic by which the journal is known, the USIN system should respect this. The second issue is that the syntax for identifying components of a particular serial should be flexible and changeable. For example, if a serial startswith sequentially numbered issues, its USIN syntax should nevertheless accommodate a later reorganization to number the publication by volume. Similarly, if a traditional print journal identifies articles by volume and page number, the USIN syntax should accommodate a later change to an electronic format in which articles are identified by volume and article number.

2.7 Requirement #7: version discrimination

Articles evolve. Draft versions may be initially circulated in a working paper series, followed by revised versions in conference papers and further revised versions in journals. At various stages an author may circulate intermediate versions to limited groups for review and comment. Post-publication revision of journal articles is also becoming possible with novel e-journal policies such as those of Living Reviews in Relativity (Wheary and Schutz 1998).

The USIN scheme generates distinct identifiers for each separately published version of an article. One possible view of this is that each of these identifiers is an alternative identifier of the same article, with one of them (presumably the most recent) being the canonical form. However, this approach has several serious problems. First, there is no good basis for saying when two versions of an article should be treated as the same. How many insertions and/or deletions of text may be accommodated? What about changes in title or authorship? It is difficult to imagine any set of rules that could provide a satisfactory and implementable decision procedure. It is also difficult to imagine any mechanism that could ensure that publishers actually identify these equivalent versions so that the correct mappings to canonical form can be made automatically. Beyond these concerns, there is also a problem with such equivalences automatically being applied to citations: changes in the content of an article between versions may render a citation apparently irrelevant or incorrect. This should not be considered a failure on the part of the citing author. In essence, it is a misrepresentation to map the author's citation of a particular version to any other version than the author intended.

Philosophically, then, USINs are names for particular versions of articles, not names for the more abstract notion of an article that maintains its identity through various versions over time. Systems to support this more abstract notion, at least at the coarse-grain level of publication versioning, might well be built on top of a USIN system, using USINs to identify particular published versions of articles. Finer-grained versioning concepts, such as those of Augment/NLS (Englebart 1984) or Xanadu (Nelson 1987), might also use USINs to interoperate with conventional bibliographic databases.

The sharp reader may notice an apparent contradiction between the USIN requirements with respect to changes to serials and changes to articles. The USIN requirement for serial codes does represent the more abstract notion of a serial publication as it goes through various changes rather than the serial as it exists at a single point in time. However, this distinction between the treatment of serial and article identifications reflects a fundamental philosophical view. In this view, serials are like timelines and articles are like points on those lines. The timeline may go through the twists and turns of changes in publisher, title or numbering scheme and still retain its identity. Each point on each line is a separate entity with a separate identity. There may be relationships between points such as 'version-of' and 'cites', but the separate identities of the points should be maintained in the USIN approach.

3 Global naming of serial publications

3.1 Hierarchical naming using the DNS model

The Domain Name System (DNS) of the Internet is a successful model of a hierarchical, globally-unique naming system using distributed authority (Mockapetris 1987). Under DNS, a number of global domains such as 'edu' (educational institutions, primarily U.S.), 'org' (organizations, primarily non-profit), 'ca' (Canadian sites), have been established by common agreement. Each domain is managed by an independent domain authority. Each domain authority assigns unique identifiers within its domain to create subdomains and/or to specify particular computer systems. When a subdomain is created, authority for assigning further identifiers within the subdomain is often passed to a responsible organization. Subdomains may be further divided into sub-subdomains, and so on.

Consider a USIN scheme that adopts the hierarchical naming idea of DNS, but with a focus on naming serial publications and publishing organizations, not computer resources. The distinction between naming publications and naming computer resources is critical; the failure to make it may be one of the underlying problems of the URN concept. Notations such as the following may be contemplated:

  • S.ACM/TOPLAS as a designation for ACM Transactions on Programming Languages and Systems within a global domain for scholarly societies,
  • S.ACM.SIGPLAN/Notices for SIGPLAN Notices of the ACM's Special Interest Group on Programming Languages,
  • CA.SFU.CMPT/TR for the Technical Report series of the School of Computing Science of Simon Fraser University,
  • and AU.NLA.ABN.SC/Papers for papers of the Standards Committee of the Australian Bibliographic Network of the National Library of Australia within a global domain for Australia.

These examples are for illustrative purposes only; the actual development of a domain structure and names for serials and their publishers requires a process of international consultation and consensus.

In the USIN scheme, then, serial publications are given identifiers which must be unique in the context of a particular publication domain. Thus d1.d2.d3 is interpreted to specify a subdomain d3 within domain d1.d2, which is itself hierarchically specified as a subdomain d2 within the global domain d1. In general, domains will denote publishing organizations, administrative divisions of such organizations or collectives for identifying organizations or publications.

The USIN syntax shown in this paper is intended to be illustrative rather than prescriptive of the final form of USINs. Thus the choice of periods and slash marks as separators is somewhat arbitrary. One could also argue that the distinction between slash marks and periods is artificial, i.e. that S.ACM.TOPLAS would do as well as S.ACM/TOPLAS. However, distinguished punctuation allows us to infer directly from the form of a specification that S.ACM/TOPLAS is a serial publication of the ACM, while S.ACM.SIGPLAN is an administrative division thereof. One could also question the decision to reverse the right-to-left structuring of domains under DNS; the reason for this is to use a consistent left-to-right hierarchical structuring within all levels of the USIN notation. Last, the final syntax of domain, subdomain and series identifiers is left as an area for further work. However, allowance for case-sensitivity in such identifiers seems reasonable, e.g. CaS and CAS could denote separate items.

3.2 Three initial domains

Prior to international agreements to develop a full domain structure for USINs, it is still possible to initialize the scheme by building on existing global identification standards. With the present focus on the problem for scholarly literature taken in this paper, three initial USIN domains can be identified: ISSN, ISBN and RDNS. The ISSN and ISBN domains use the international standard numbering systems directly for serials and books. For example, ISSN/0164-0925 is an initial USIN designation for ACM TOPLAS. Over time the notation S.ACM/TOPLAS might be adopted as the canonical designation of this journal, but ISSN/0164-0925 will always be acceptable. Similarly ISBN is identified as a global domain based on International Standard Book Numbers.

Assigned DNS names are the basis for the third leg of the initial tripod supporting the USIN scheme. Whenever a DNS domain name or host name is clearly associated with a particular publishing organization, it may be used as a component of the RDNS (restricted DNS) domain of the USIN scheme. For example, acm.org is a DNS domain identified with the Association for Computing Machinery, so RDNS."acm.org"/TOPLAS denotes ACM TOPLAS. Similarly, sfu.ca is a DNS domain for Simon Fraser University, so RDNS."sfu.ca".CMPT/TR denotes the Technical Report series of the School of Computing Science at SFU. In this last example, one might consider instead basing the USIN specification on the cs.sfu.ca domain, that is, RDNS."cs.sfu.ca"/TR. This form might be allowed, but the form based on the CMPT designation may be preferred (canonical), because that designation has been specifically chosen by SFU in a system of unambiguous codes for its departments.

The syntactic convention of enclosing a DNS name in double quotes when used as an RDNS domain serves two purposes. First, it emphasizes that the hierarchical structure of the DNS name plays no role in the interpretation of that name as an RDNS subdomain. In essence, DNS names are being cited as atomic identifiers for publishing organizations. Second, the quote marks delimit the scope of a DNS name, within which the '.' separator is understood not as a part of the USIN syntax, but simply as a character in a quoted DNS name.

Unfortunately, there is no constraint within the DNS system that DNS domains are permanently unique designations of organizations or their successors. Under DNS, the essential requirement is that domains are unique at any particular point in time, but it is quite conceivable that a naming authority at some level may reuse or reassign a name. Furthermore, the association between DNS names and organizations breaks down as one descends into the hierarchy of subdomains, sub-subdomains and so on. To avoid these problems, the USIN standardization process could include the publication of a list of acceptable DNS names and their associated organizations for use within the RDNS domain of the USIN scheme. These designations should be permanent; the interpretation of a designation within the RDNS domain should be derived from this list, even if that designation is later reassigned to some other purpose within DNS itself. The intention of the list should be to identify all and only those DNS domains that may be clearly identified with publishing organizations.

The astute reader will note that designations such as RDNS."acm.org"/TOPLAS and RDNS."sfu.ca".CMPT/TR seem unnecessarily awkward compared to the earlier examples S.ACM/TOPLAS and CA.SFU.CMPT/TR. We should hope that forms such as the latter ultimately become canonical under the USIN system. One might ask, then, why not just skip the RDNS prefix, reverse the order of DNS domain names and use those reversed names directly at the top-level of the USIN hierarchy in the initial instance? The answer is that the top-level domain structure of the USIN system should not be prematurely constrained. Once established for a particular use, USIN designations are intended to be reserved permanently for that use. The RDNS prefix allows existing DNS names to be used as a way of initializing the USIN system, giving time for an orderly process of developing an internationally-acceptable top-level domain structure.

Within the RDNS domain for a particular publishing organization, the identification of administrative divisions and publication series should use codes specified by that organization. In many cases, clear coding schemes are already in place. In the important case of universities, a system of unambiguous mnemonic codes for the academic departments is typically available in the university calendar. Codes to denote a publication series of a university department (e.g. TR for Technical Report, TN for Technical Note, and so on) are often included on publication lists produced by the department or may be found on the documents themselves. Wherever possible, the use of existing naming schemes should be accommodated in this way, to maximize the scholar-friendliness of USIN designations.

Occasionally, one finds a DNS domain that directly corresponds to a particular serial publication. For example, the electronic journal First Monday has an associated DNS domain firstmonday.dk. In this case, the DNS name can be used as a serial publication name directly within RDNS. Assuming that the Internet domain for First Monday is registered on the list of acceptable RDNS domains, it has the USIN RDNS/"firstmonday.dk".

To ensure the robustness and permanence of USIN designations, one should expect that certain adaptations and accommodations of historical naming schemes will be required. Thus, the USIN system must include a method for describing naming schemes and rules for maintaining consistency. To make the greatest use of historical naming schemes, the rules should be designed to accommodate a great deal of variability. Nevertheless, some modifications of historical naming schemes should be expected in order to comply with USIN requirements.

The three initial domains ISSN, ISBN and RDNS provide a plausible initial basis for unified, permanent and globally-unique designations of archivable serial, book and institutional publications. There are undoubtedly many cases in which the coding of USIN specifications will initially be unclear, especially in the case of institutional publications. However, it is certainly a common practice for the serial publications of an institution to be identified using a numbering scheme that serves to unambiguously denote those publications in the local context of an institution. It is also the case that the vast majority of publishing institutions in the industrialized world can now be identified by an appropriate DNS domain. These conditions suggest that it is feasible to initiate a USIN system.

3.3 Evolution of the USIN system: towards scholar-friendly names

Although the ISSN, ISBN and RDNS domains may serve to initialize a USIN system, they will not generally provide a satisfactory basis for the scholar-friendly canonical designations that meet USIN Requirement #4.2. The development of an internationally acceptable domain structure is beyond the scope of this paper. However, to stimulate discussion, the References section of this paper includes, for each of the cited references, discussion of possible initial USIN designations and forms that may evolve over time.

4 Hierarchical identification of serial items

This section focusses on the problem of identifying articles and other components within the context of a particular serial. For concreteness, the first subsection starts with a proposed USIN syntax for citing journal articles. Following this, a general model for serial item identification by hierarchical numbering of items within a series is presented. The final subsection returns to the exploration of some additional design ideas for USIN syntax.

4.1 Example: journal article citation

The following examples illustrate a proposed syntax for citation of traditional (print) journal articles.

S.ACM/TOPLAS:16@1811
Assuming that S.ACM does become the code for the Association for Computing Machinery in the global domain for scholarly societies, this is the canonical USIN in the proposed syntax for the article 'A Behavioral Notion of Subtyping' by Barbara H. Liskov and Jeannette M. Wing appearing in ACM Transactions on Programming Languages and Systems, volume 16, number 6, (November 1994), pages 1811-1841.
S.ACM/TOPLAS:16(6)@1811
This is an acceptable alternative USIN for the same journal article, specifying the issue number.
S.ACM.SIGPLAN/Notices:32(1)@66
This denotes the position paper 'Global Computation' by Luca Cardelli, published in ACM SIGPLAN Notices, volume 32, number 1, January 1997, pp. 66-68. In this case the issue number is required, because pages are renumbered from 1 with each issue of SIGPLAN Notices.

The syntax is intended to be scholar-friendly: mnemonics of the roles of each component in the numbering. Volumes are emphasized as the first numbering component, issues are enclosed in parentheses consistent with many standard citation formats and the '@' indicates the page number at which the article starts.

It is possible to contemplate a generic syntax for the numbering of serial items, avoiding specialized syntax for each type of item. For example, the conventions of the Web's Universal Resource Identifiers (URIs) (Berners-Lee 1994) might be adopted to use the '/' punctuation for separation of all elements within the hierarchical numbering of a serial item. The designation of the TOPLAS example might become S.ACM/TOPLAS/16/6/1811. Unfortunately, there are a number of disadvantages to a generic syntax for hierarchical numbering. First, with respect to journal numbering, optional issue numbers are not easily accommodated. For example, how is S.ACM/TOPLAS/16/1811 as an article denotation reconciled with S.ACM/TOPLAS/16/6 as an issue denotation? Second, the mnemonic value of associating specific symbols (e.g. '@') with specific concepts (e.g. "'at page number") is lost. Finally, there may be syntactic conflicts between the universal syntax and existing syntaxes for publisher's numbering schemes. For example, the '/' separator for URI syntax conflicts with the combined-issue designations such as 3/4 that are frequently used by journals such as The Serials Librarian. For these reasons, it seems preferable to avoid specifying a generic universal syntax for serial numbering and instead allow series-dependent syntax. Nevertheless, the number of alternative syntactic schemes should be limited to avoid cognitive burdens for the scholar.

4.1.1 Multiple articles per page.

Some journals start more than one article on a particular page. For example, these might be items of technical correspondence. One solution to this ambiguity is to use sequential denotations with lower case letters. For example, S.ACM/CACM:38(1)@43a and S.ACM/CACM:38(1)@43b could respectively denote the two short articles 'Women and Computing in the UK' by Alison Adam and 'Announcing a New Resource: The WCAR List' by Laura L. Downey, both appearing on page 43 of Communications of the ACM, volume 38, number 1 (January 1995).

There are three small problems with this scheme that may be quite rare but are theoretically possible. First, there may potentially be more than 26 articles on a page. However, the scheme easily extends so that designations such as aa for the 27th article and aaa for the 677th article may be used. Second, there may be an ambiguity in determining the ordering of articles; pages are two-dimensional while orderings are one-dimensional. The most scholar-friendly way to resolve this is to follow the natural text ordering. For publications in English and similar languages, this is column-major numbering: articles in column 1 always precede articles in column 2, and so on, while articles within columns are numbered top to bottom. Finally, note that page numbers themselves might include lower case letters. An example is preface material in a journal volume numbered using lower case roman numerals. To handle this case, the USIN scheme might specify that the underscore (_) character can be used as a separator.

In practice, scholars will not want to learn the details of how to distinguish multiple articles on a page until it becomes a problem. They may not even be aware of the problem if they are entering a citation from its written form in a reference list. In such a case, the user will likely omit the required lower case code when entering the citation. Interactive USIN processing software should notify the user of the ambiguity and query him or her for its resolution. Batch-oriented software could return the set of all articles on the page and issue a warning report through an appropriate message or log file.

4.1.2 Unpaginated e-journals

When a journal is not printed in pages, one might expect that article identification by page number is no longer appropriate. Although many e-journals have retained page-oriented formatting and numbering, many others have chosen not to do so. In particular, there is a growing trend to use the logical document markup capabilities of SGML (Coombs et al. 1987) and HTML in e-journals. One advantage is that formatting may be left to the reader's software; articles can be viewed and printed in a variety of different formats (with a variety of different paginations) depending on hardware capability and reader preference. In view of this, it seems reasonable to expect that the trend towards unpaginated e-journals will continue.

Consider a variation on the standard USIN journal syntax that accommodates unpaginated e-journals by replacing the @page syntax with $article-number. (An earlier version of this paper used the more mnemonic # to denote article numbers, but the $ is easier to use when USINs may be encoded as URLs.) Some e-journals have explicit article numbering by volume, e.g. the Chicago Journal of Theoretical Computer Science. Supposing that S.MITP/CJTCS identifies this journal, S.MITP/CJTCS:1995$3 then denotes article 3 in volume 1995, entitled 'Rabin Measures' by Nils Klarlund and Dexter Kozen. In other cases, articles may be numbered within issues. Thus ISSN/1201-2459:2(3)$4 would denote the article Reflections on Milton and Ariosto' by Roy Flannagan, published as article 4 in Early Modern Literary Studies (ISSN 1201-2459), volume 2, number 3.

When no explicit numbering is provided, there are several reasonable alternatives. Often a journal provides a table of contents with each issue; article numbers could be determined by counting. Alternatively, a journal may issue articles individually, with incremental updates to a cumulative contents list. So long as the journal maintains a consistent policy of adding new articles to the end of the contents list, counting can serve for unambiguous article number determination. However, there are cases in which counting may be ambiguous, for example, when contents lists include mixed items such as regular articles, short notes, corrigenda, and so on, or when a journal is published following an 'article database' model for which no canonical article ordering is defined. In this event, definitive article numbers might only be established upon article registration in a USIN global database.

An alternative to article numbering is to use publisher-defined symbolic article tags that are often found in article URLs. For example, this article may be referenced by the tag 'Cameron' in the context of volume 1, issue 3 of the Journal of Digital Information, so a potential USIN for this article might be S.BCS/JoDI:1(3)$Cameron.

Ultimately, it may be that no single convention suffices for article identification within unpaginated e-journals. Although minimizing the number of identification conventions is helpful to scholars, the USIN system contemplates the possibility of defining article identification schemes on a publication-by-publication basis within the overall framework of an hierarchical numbering model.

4.2 General model for identification by hierarchical numbering

The scheme just illustrated for journal citation is an example of a general concept for serial item identification: the use of a hierarchical numbering system. Abstractly, serial items are identified in the context of their serials by specifying hierarchical numbering tuples. For example, (volume, page) 2-tuples serve to identify articles in some print journals, while (volume, issue, page, item-count) 4-tuples may be required for magazines. In some cases, the hierarchy may be quite deep; items in a particular newspaper may be identified by a 7-level numbering (volume, issue, edition, section, page, column, item-count). This is the essence of serial identification: although the particular scheme employed may vary from serial to serial, every item within every serial may be abstractly identified by some form of hierarchical numbering tuple.

It is interesting to note that a hierarchical enumeration system ('tumbler addressing') was also used as the basis of universal document identification in the proposals for the Xanadu Docuverse (Nelson 1987). However, those identifications were based on a server/user/document/version/content hierarchy rather than the pure publication numbering hierarchy considered here. In essence, the Xanadu address system attempted to develop a new numbering system to apply to all documents, whereas the USIN approach is to characterize and use existing publication numbering hierarchies within a common framework.

4.2.1 Scope

One defining characteristic of the USIN hierarchical numbering model is that every counter within every numbering tuple has a scope that defines the context of its numbering. Issues of a journal are typically numbered from 1 within each volume; they are said to have volume scope. Page numbers may have volume scope or issue scope, depending on the particular serial. An 'item-count' for distinguishing multiple articles per page has page scope. The first, or principal, numbering component of a serial is said to have global scope; it is numbered consecutively in perpetuity.

Numbering scope is correlated with, but not synonymous with, hierarchical level. For example, volume scope for page numbers is often used even when volumes are divided into issues. Similarly, although issues are usually given volume scope when volumes exist, they may sometimes be given global scope.

4.2.2 Scope-dependent numbering

Another important aspect of the model is the use of scope-dependent numbering. In general, this reflects the fact that some properties of a counter at a particular level may depend on the actual values of counters at superior scope levels. Some of the scope dependencies may be relatively minor. For example, a quarterly journal that changes to a bimonthly journal starting with volume 23 exhibits a scope-dependency: issues are number 1 through 4 for volumes 1 through 22, and are numbered 1 through 6 thereafter. Scope-dependency may even affect the need for a particular counter in serial item identification. For example, the item-counter for multiple articles per page is not needed for those pages that have only one article starting on a page. Scope-dependencies may even affect the entire numbering system. For example, a print journal may switch to electronic publication with a corresponding switch from a (volume, issue, page) numbering scheme to a (volume, article-number) scheme.

4.2.3 Syntactic representation

In general, the numbering scheme for every serial has a syntactic representation that may be generated by mapping rules from the abstract representation as a hierarchical numbering tuple. In the suggested standard journal article syntax, the (volume, page, item-number) tuple of (12, 135, 2) maps to the syntactic representation 12@135b. Each number in a hierarchical numbering tuple is first mapped to a numeral in some encoding system, such as arabic numerals, roman numerals or 'alphabetic' numerals (a, b, c, ..., aa, ab, ...). Then a syntactic string for the entire structure may be constructed by concatenation with appropriate mnemonic operator symbols as punctuation. An essential goal of this process is that the syntactic encoding be uniquely decodable. Operator symbols must be carefully chosen both to have mnemonic value and to ensure unambiguous interpretation of the syntactic forms. In principle, the order of appearance of numbering elements may also be considered a design choice, but for simplicity and to avoid confusion it may be desirable to enforce a strict left-to-right ordering of elements according to the numbering hierarchy.

4.2.4 Parallel numbering hierarchies

A fourth aspect of the hierarchical numbering model is that a serial may have parallel numbering hierarchies for different purposes. In general, these hierarchies have a common numbering prefix consisting of one or more of their uppermost numbering levels, with divergence of numbering below these level(s). The simplest example is that of the article-identification and issue-identification hierarchies of journals that are paginated with volume scope. In this case, the (volume, page) and (volume, issue) hierarchies may be considered parallel. In general, syntactic devices are necessary to distinguish which hierachy is intended in any particular coding; the (volume, page) and (volume, issue) hierarchies are distinguished by the @ and () syntax notations given previously. Other examples of parallel numbering are given in the later subsection on secondary component notation.

4.2.5 Chronology

Finally, chronology is the fifth general property associated with the hierarchical numbering model for serials. Chronology is the association of a date and/or time of publication with a particular serial numbering component. In general, chronology is a fundamental aspect of serial publication and should be defined for all hierarchical numbering components down to some level at which all further structure is considered simultaneously published. For example, traditional print journals have chronology specified to the issue level, while electronic journals may have chronology specified to the article level. In general, chronology is scope-dependent; for example, when a quarterly journal becomes monthly, the chronology associated with issue 3 in each volume may change from 'Fall' to 'March'. Chronology may also be irregular and possibly out-of-sequence, that is, with publication numbers assigned out of order of actual publication dates. Chronology itself is also an instance of hierarchical numbering, for example, using (year, month, day) 3-tuples or (year, season) 2-tuples.

4.2.6 Further work: hierarchical numbering theory

One direction for further development is to consider formalization of the model to become a theory of hierarchical numbering. Such a theory would have as its purpose the establishment of certain important properties, such as ensuring that every published item is denotable by a hierarchical numbering tuple, every tuple has a syntactic representation and every syntactic representation is unambiguously decodable. In particular, careful attention should be given to the formulation of arithmetic operations to avoid problems such as the 'paradoxes of tumbler arithmetic' in the Xanadu scheme (Nelson 1987). The theory should also account for the particular properties of hierarchical chronological numbering. In this regard, the theory should be informed by the extensive work of Dershowitz and Reingold (1997) in developing the mathematics of many of the world's important calendar systems.

4.3 Additional design ideas for hierarchical numbering

The following subsections present a number of additional design ideas for the identification of serial items by hierarchical numbering. Although many of the ideas are illustrated using examples related to journals, they are intended to apply to other types of serial as well.

4.3.1 Syntax for holdings description

Beyond article identification, the next most important application area for USINs may be in the description of library holdings or document delivery service coverage. A single volume or issue of a journal is simple to identify by including numbering only to the desired level. For example, S.ACM/TOPLAS:16 denotes volume 16 of TOPLAS, while S.ACM/TOPLAS:16(6) denotes issue 6. But holdings are more often described as volume ranges. In cases where issues are missing, or subscriptions have been cancelled and then reinstated, or miscellaneous holdings have been received by donation, the holdings may be broken up into a lists of individually held items or ranges. To accommodate these requirements, it seems reasonable to reserve the comma (,) to separate elements of a holdings list and the double hyphen (--) to serve as a range operator.

Consider a holdings pattern for ACM TOPLAS consisting of volumes 2 through 12 and 16 forward, except for the missing issues 2 and 4 of volume 10. The following USIN holdings specification could be descriptive.

S.ACM/TOPLAS:2--10(1),10(3),11--12,16--ff

Here the serial code is specified only once. Commas separate individually held items or ranges. The start and end of a range are indicated by enumeration to the required level of specificity. An end range of 'ff' indicates a continuing subscription. As a syntactic constraint to aid in error detection, holdings should be listed in strictly ascending order.

Only positive holdings data is shown, following the principle adopted by ANSI Serials Holding Statements (ANS 1986). Determination of missing items can be made by reference to either the USIN global database or an appropriate serial 'definition' (see the subsection on Serials Definition Language in the following section). For example, using the knowledge that TOPLAS was quarterly during volume 10 tells us that 10(2) and 10(4) are missing for these holdings while 10(5) is not (because it does not exist).

The conventions for serials holdings are intended to apply to serials with any form of hierarchical numbering and to any level of specifity. One implication is that the syntax of USINs generally must be structured to avoid conflicts with the ',' and '--' symbols of the holdings notation. Another implication is that coverage can be specified to a finer level of detail. For example, a document delivery service may wish to identify 'scanned holdings' to the article level, that is, the articles that have already been scanned or digitized and are hence available for short-turnaround delivery.

4.3.2 Secondary component notation

Secondary component notation is a proposed means of specifying abstracts of articles, tables of contents of issues, indexes of volumes and other secondary components of serials or their articles. In general, secondary component notation is introduced by a USIN for the relevant article, issue, volume or other component, followed by a vertical bar and a component specification. The component specification is typically a standardized mnemonic for the component, possibly followed by a parenthesized enumeration. The following examples are illustrative:

S.ACM:TOPLAS:16|index
The index of volume 16 of TOPLAS (found at the end of S.ACM:TOPLAS:16(6)).
S.ACM:TOPLAS:16(6)|contents
The table of contents of volume 16, issue 6 of TOPLAS.
S.ACM:TOPLAS:16@1811|abstract
The abstract of an example TOPLAS article.
S.ACM:TOPLAS:16@1811|sec(4.1)
Subsection 4.1 in the example article, entitled 'Type Specifications'.
S.ACM:TOPLAS:16@1811|fig(3)
Figure 3 in the example article, captioned 'Stack Type'.

The last two examples illustrate parallel (volume, page, section, subsection) and (volume, page, figure) numbering hierarchies respectively for sections and figures within articles.

It is anticipated that a standard set of mnemonics for standard components would be globally defined (index, abstract, section, figure, table, equation and so on) while others may be defined for individual publications. However, scope dependencies and numbering syntax for enumerated components will typically be defined on a serial-by-serial basis.

One may question the need for fine-grained identification of article components. Indeed it is reasonable to consider deployment of an initial USIN system that focusses on article identification. Nevertheless, for a scheme that is designed to serve for article identification and related purposes in perpetuity, it would seem foolhardy not to allow the extension of the scheme using a notation such as the secondary component notation presented here.

4.3.3 Reference notation

The reference notation is a particular application of the secondary component notation that would allow designation of an article or other contribution by indirect reference. For example, S.ACM/TOPLAS:16@1811|ref(17) denotes reference 17 of the article starting on page 1811 of volume 16 of TOPLAS. This reference is, in fact, to an article entitled 'A semantic database model' by Hammer and McLeod appearing in ACM Transactions on Database Systems, 6(3), pp. 351-386. Assuming that the appropriate citation database exists, the indirect reference in this case could map to the canonical form S.ACM/TODS:6@351.

One use of the reference notation is to guarantee that an acceptable USIN can be generated quickly for every reference in an article, providing that a USIN can be generated for the article itself. During creation of citation databases, it may be desirable to produce a full set of USINs for the reference lists of articles in a fairly expeditious fashion. If the resolution of some references to their direct USIN form is proving problematic, they may be left in indirect form during initial data entry. Later, the resolutions of indirect references may be entered either manually or by acquisition of an independently-developed citation set for the same article.

Another use of the reference notation is to serve as a unique canonical form for personal communications, unpublished works and other otherwise undenotable items. In this way, there would be no need to create a classification or coding scheme for such references. Furthermore, each such item would be automatically given a permanent and unique code. For example, if two authors each write articles citing "Famous Person, personal communication", those citations would be given distinct canonical identifiers. This would prevent false positives when doing co-reference searches (finding papers that have two or more references in common).

The reference notation is best supported by article styles with an explicitly numbered reference list at the back. If a reference list exists, but is not numbered, reference numbers may be determined by counting. Alternatively, if references are cited by symbolic tags a possible design choice is to use the symbolic code itself in the reference notation. For example, the citation of the SICI standard referenced in an earlier version of this paper might be given the indirect reference RDNS."sfu.ca".CMPT/TR:97-16|ref(SICI). Another style may use numbered endnotes, with the possibility of more than one reference per note. In this case, enumeration with endnote number may use lower case letters: |ref(3c) would denote the third item cited in endnote 3 of a particular article. In general, each serial may define its own reference numbering conventions, but it is highly desirable that one of the standard forms be chosen.

4.3.4 Hyphenation notation

In some cases it may be desirable to break a long USIN over multiple lines. This can be accommodated by the following hyphenation convention. A line break may be inserted after any hyphen appearing in a USIN, without changing its meaning. Furthermore, any non-hyphenated USIN operator can be converted into a hyphenated equivalent of that operator by adding a hyphen to the end. Thus, the hyphenated equivalents of '.' and '/' and '--' are respectively '.-' and '/-' and '--' (no change). The following examples illustrate this convention in use:

RDNS."sfu.ca".CMPT/-
TR:97-16|ref(SICI)

S.ACM/TOPLAS:2--15(1),-
15(3),15(5)--17,20--ff

S.ACM/TOPLAS:2--15(1),15(3),15(5)--
17,20--ff

RDNS."sfu.ca".CMPT/-TR:97-16|ref(SICI)

The last example illustrates that a 'new line' character is not strictly required after a hyphenated operator. This accommodates reformatting operations that might eliminate an inserted 'new line' character but leave a vestigial hyphen in place. Conversion to canonical form eliminates any hyphenated operators and embedded new lines. USIN processing software should fully recognize the hyphenation convention in the event that a multi-line USIN is entered using a cut-and-paste operation.

5 USIN support technology

This section considers two important models of support technology for a USIN scheme: a USIN Global Registry and a USIN Global Database System. The USIN Global Registry is proposed as a system of institutions and technologies designed to preserve the knowledge of assigned USINs and their denotations for posterity and to support publishers and librarians in the assignment of new USINs for new and/or unassigned works. As differentiated from the Registry, a USIN Global Database System is not intended for USIN updating, but is intended to support the day-to-day needs of scholars for access to USIN information. This distinction is conceptually valuable in organizing requirements for the separate purposes of USIN registration and USIN-based information retrieval. It might ultimately be the case that the registry and database components are implemented in a single system, however.

In discussing these technologies, the goal is to present a vision of how USINs may be generated, verified and used in the day-to-day work of publishers, librarians and scholars. At this point in the development of the USIN concept, the focus should be more on the analysis of overall system requirements than on the implementation details of underlying mechanisms. Nevertheless, a number of design ideas are included to help give a more concrete picture of the possible operation of an integrated global USIN system.

5.1 USIN global registry

Consider a design for the USIN Global Registry based on four principal components. These are:

SDL: Serials Definition Language:
a language for specifying serial publications and their publication schemes.
UPP: USIN Publication Protocol:
a protocol for assigning USINs as part of the publication process and verifying that they meet global uniqueness and permanence of identification requirements.
SRP: Serial Registration Protocol:
a protocol for registering and revising serial codes and their SDL definitions.
PDP: Publication Domain Protocol:
a protocol for creating, modifying and deactivating publication domains.

These are the technologies that publishers and librarians could use on a daily basis in the assignment of USINs to serially published items.

5.1.1 SDL - Serials Definition Language

Fundamental to the USIN concept is the use of serial designations and numbering schemes for identification of articles and other serial components. To formally specify these schemes, consider the creation of a Serials Definition Language (SDL). Each SDL specification would define one serial, establishing its basic identity and publication scheme. In particular, this would include formal specification of the hierarchical numbering scheme of the serial including its abstract structure, scope-dependencies, chronology, and syntactic identification schemes for articles and other serial components. It would also include the specification of the canonical and allowable alternative forms for USIN designations.

In addition to its formal role in the USIN scheme, SDL should also be designed to serve a variety of related purposes. From a serials check-in and claiming perspective, the enumeration and chronology specifications of an SDL definition should also have predictive value as contemplated, for example, by the serial pattern scheme of McNellis (1996). The SDL definition of a serial should also provide a basis for evaluating and interpreting USIN holdings specifications and possibly converting them to MARC Holdings Format. Similarly, from a bibliographic database perspective, it should be possible to verify the enumeration and chronology recorded in a database entry against that specified in an SDL definition. It should also be possible to determine the comprehensiveness of database coverage: are there any issues or articles published that are not in the database, or is the database complete?

The requirements above relate to a fairly narrow definition of serials, namely, in terms of the logical schemes for enumeration, chronology and serial item identification. It is possible to define a language (say, SECIL) that would be limitied to these requirements. Such a narrow approach would serve to support a USIN system, but it seems reasonable to consider serial definition from a broader perspective while the opportunity exists. In particular, the definition of a serial logically includes not only its numbering scheme, but also the title, publisher and publication format. Incorporation of such elements into the language would seem necessary to merit the term 'serials definition langua'. Beyond this, one might wish to include additional information, notably classification and indexing information. This reflects a cataloguing perspective and suggests that a nomenclature of SCL (serials cataloguing language) might be appropriate. However, from the viewpoint of designing good modular systems, the SDL approach is arguable preferable, because it focusses on information deriving directly from its publication and relevant to the essence of what the serial is. Cataloguing information is essentially third-party information that may derive from a variety of sources and should be kept separate; it is information about the serial, not information defining it. Detailed exploration of these issues is an area for further work.

5.1.2 UPP: USIN Publication Protocol

When USIN-based bibliographic databases are in widespread use, publishers will find that the sooner an article is assigned a USIN, the sooner it is advertised to large communities of scholars. The USIN Publication Protocol (UPP) is therefore proposed to allow publishers to assign each article a USIN during the publication process, thereby updating the USIN databases automatically.

A major requirement for UPP is to ensure the integrity of assigned USINs from the standpoint of global uniqueness and consistency with the current SDL definitions of serials in question. One approach is to maintain within the USIN Global Registry a current publication state for each serial and to define acceptable UPP actions in terms of this state. In essence, the publication state identifies the last-issued USIN for the serial, plus a specification of which numbering levels in the hierarchical numbering scheme are currently open. This gives a basis for predicting the counter and date values for upcoming UPP requests.

For example, consider the publication state that might exist after registering the article 'Collecting Interpretations of Expressions' by Paul Hudak and Jonathon Young appearing in ACM TOPLAS, Volume 13, Number 2, April 1991, pages 269-290 with the USIN S.ACM/TOPLAS:13@269. The state may include volume and issue counters that are currently open with values 13 and 2, respectively. A page counter may be closed at page 290 (nothing more will appear on page 290). At this point, there may be two legal UPP actions: add another article in this issue or close it. As it happens, there is one more article in the issue. Based on the current publication state, an expectation may be generated that the next article will have USIN S.ACM/TOPLAS:13@291. If the publisher submits that USIN with the next UPP request, it can be accepted, otherwise an error can be reported.

After a 'close issue' request has been made, the SDL definition and publication state can be used to predict the next publication action and expected date. In the example, this is an 'open new issue' request for issue 3 of volume 13, July 1991. These may be verified when the actual request is made. When issue 4 of this volume is closed, the SDL definition should tell us that there are no more expected issues in this volume. The expected sequence of following UPP requests is then a 'close volume' request, followed by an 'open volume' request for volume 14, 1992, an 'open issue' request for issue 1 in January 1992 and an article publication request with USIN S.ACM/TOPLAS:14@1. Each of these expectations may in turn be verified against the actual UPP requests made.

Of course, mechanisms will be required to deal with various kinds of exceptions to the predicted publication pattern. For example, when a particular issue is expected, one may instead see a combined issue (with combined enumeration) instead. Alternatively, an issue may be skipped altogether, or a special issue may be inserted into the publication stream between two regular issues. Publication numbering may also be out of order with respect to date of publication. For example, in a technical report series it is not uncommon for numbers to be assigned in advance of publication, with variable delays between the assignment of a number and actual publication. An apparent publication exception may also be the first indication of an actual change in publication pattern. In this case, the SDL definition should be corrected to reflect the updated publication pattern and reregistered with SRP, described below.

5.1.3 SRP: Serial Registration Protocol

Serials Registration Protocol is the proposed service for registering a serial code and its accompanying SDL definition and tracking changes over time. This includes:

  • registering changes in publication numbering or chronology
  • changes in publisher or publication domain
  • addition of alternative USIN codings
  • changes to the canonical USIN form
  • deactivations and reactivations.

In general, SRP requests would be made with respect to a particular publication-domain/serial-code combination.

Perhaps the most critical function under SRP is the creation of a new serial code within an existing publication domain. The code may be the initial code for a new or previously unregistered serial publication or it may be an alternative code for an existing publication. In either event, creation of a serial code should always be considered with care, because it creates, in the context of the given publication domain, a permanent USIN binding between that code and the serial in question. From this perspective, it is worth considering appropriate verification actions for creation of a new serial code. Of course, verification that the code is previously unassigned is an automatic function that should be implemented by the appropriate query to the USIN Global Registry. Beyond this, there should also be some manual verification to ensure that the code assignment is reasonably consistent with the USIN concept. One option is to use national serial registration centres analogous to those of the current international ISSN network. However, such a system is likely to be too cumbersome for the management of publications at the fine-grained level of, say, minutes of committee meetings of particular university departments. Also, it does not account for an institutional role in approving the serial codes chosen by administrative divisions within the institution.

An alternative for verifying serial code assignments that overcomes these problems is the following. SRP requests for new serial code creation must be approved by a USIN-certified cataloguing librarian. Certifications are awarded by an appropriate international standards body. Each authority for a publication domain may designate a certified librarian for that domain. When an SRP request to create a new serial code is issued, it is handled by the librarian registered for that domain, if such a librarian exists. Otherwise, verification of the creation request is attempted in the immediately superior publication domain, and so on. For example, a university may designate a single USIN-certified librarian to handle all institutional requests for new serial codes. Regardless of how deeply structured the administrative hierarchy within the university is, all serial code creation requests within the university are passed up the domain hierarchy to be handled by this individual.

The second major function of the SRP protocol is to register the publication pattern of a serial and changes to that pattern as required from time to time. As described above, these publication patterns are specified as part of the serial's SDL definition. UPP can be used to check the consistency of the publication patterns against future publication attempts, that is, each time a USIN is specified in a future UPP request, it serves to check that the SDL definition is correctly predicting the actual publication numbering and chronology.

Whenever the publication pattern of a serial is changed, the SDL definition must be modified to account for both future and past publications. Future publications are checked by UPP. SRP is responsible for checking that the revised SDL definition correctly accounts for the USINs assigned to past publications. This checking may be done by formally re-evaluating the revised definition against the entire history of actual publication as recorded in the global registry. The checking should satisfy two conditions: (1) every USIN previously registered should be accounted for by the new SDL definition, and (2) the new SDL definition should not 'predict' any past publication that does not exist. Exhaustive checking or a provably equivalent alternative method should be used, that is, a reduced form of checking that puts at risk the consistency of the USIN system should not be justified on the basis of minor concerns of computer processing efficiency.

The third major function of SRP is to register canonical and alternative forms of USIN for a serial. When a serial is registered for the first time, the publication-domain/serial-code combination under which it is first registered is the canonical form of USIN. Subsequently, SRP may be used to create alternative USIN forms. When such an attempt is made, the SRP request must specify both the publication-domain/serial-code combination for the current canonical USIN and the new alternative publication-domain/serial-code combination. It may be reasonable to require that permission from the domain authority of both domains be obtained. Any number of alternative forms for a serial may be created in this way.

The SRP request to change the canonical form of a serial must specify the publication-domain/serial-code combination of both the current and proposed new canonical forms. The request is made by the authority for the new publication domain and must be verified by the authority for the current canonical publication domain. If approved, the change will be scheduled for the next scheduled global synchronization time for changes to USIN canonical forms, or to a later synchronization time specified in the change request. Once the change becomes effective, the canonical form is switched, but both forms remain acceptable.

SRP can also be used to deactivate or reactivate a serial. In essence, deactivation of a serial registers a new publication pattern in which no further publications are predicted. Reactivation requires a new SDL definition that may change the title and future publication pattern of a serial, but still requires consistency with the entire history of previously assigned USINs.

5.1.4 PDP: Publication Domain Protocol

Publication Domain Protocol is the final proposed service of the USIN Global Registry. This protocol is used to create and register new publication domains, transfer authority for domains, register the USIN-certified librarians for a domain and other related functions. In general, these actions will refer to subdomains of some existing publication domain; even top-level USIN domains such as ISSN and RDNS may be considerd as subdomains of a global USIN publication domain.

Creation of a code for a new publication domain under PDP parallels the creation of a new serial code under SRP. In both cases, the proposed code must be checked to verify that it is previously unused in the context of the parent publication domain. Furthermore, serial codes should be manually reviewed by a USIN-certified librarian for new publication domains. Ideally, this review should verify that the publication domain corresponds to an actual publishing institution, organization or administrative division thereof and is a scholar-friendly mnemonic designation of that unit consistent with historical practice wherever possible. Alternatively, the publication domain may represent a newly-formed collective or coalition expressly formed for the purpose of organizing the upper levels of the USIN domain structure.

A further parallel with SRP is to suggest that formal domain definitions be registered and revised as required from time to time. These definitions would specify the identity and organizational history of a publishing entity. From a domain definition it should be possible to determine the name of a particular publishing entity, its parent organization, its successors and predecessors, and so on. However, domain definitions would not have the complexity of serial definitions under SDL, because there are no corresponding requirements in publication domains for enumeration, chronology and other aspects of serial definitions.

PDP should also support the registration of alternative USINs and changes in canonical USIN for the publishing entities denoted by publishing domains. The registration of alternative USINs under PDP could parallel SRP in a straightforward fashion. However, registration of a new canonical USIN for a publishing domain is complicated by the implications for serials and subdomains within that domain. Consider a proposed change from RDNS."acm.org" to S.ACM as the canonical USIN for the Association for Computing Machinery. Normally, this should imply corresponding changes for all subordinate serials and subdomains recursively. Thus, changes in canonical USIN from RDNS."acm.org"/CACM to S.ACM/CACM, from RDNS."acm.org".SIGPLAN to S.ACM.SIGPLAN and from RDNS."acm.org".SIGPLAN/Notices to S.ACM.SIGPLAN/Notices should all be expected in the example. However, it may be unwise to make such changes automatically without review in every instance. Thus, under PDP, a change in canonical form for a publishing domain should be carried out by first registering all the appropriate changes for subordinate serials and subdomains. This may be enforced under PDP by permitting a registration of a new canonical form for a publication domain only when alternative canonical forms for all active subdomains and serials therein have been registered.

Finally, PDP should also provide for the deactivation and possible reactivation of domains. Deactivation of a publication domain implies that no further publication activity is contemplated within that domain or its subdomains. Hence deactivation of a domain should only be permitted when all subordinate serials and subdomains have themselves been deactivated. Reactivation of a publication domain may occasionally be contemplated. However, to ensure the permanence of identification of USINs issued in the subdomain prior to its earlier deactivation, a reactivation request should not be automatically granted. Instead, a 'contract' may be first returned identifying previous use of the domain, assigned subdomains and serials and the requirement that new use will respect these. The proposed new domain authority should agree to these terms before the domain can be reactivated.

5.1.5 USIN global database system

Now consider how the day-to-day needs of scholars can be directly supported by a USIN Global Database System. Three basic needs can be identified:

  • to inquire about the article or other item denoted by a given USIN
  • to cite articles by USIN
  • to use USINs in literature research, both to denote search keys (citation indexing) and search results.

USIN Inquiry Protocol is the first proposed technology to assist users in this regard; it provides for both the interactive inquiry about USINs and for hypertext citation of USINs in Web documents. To support citation by USIN in other types of document formatting software, a Bibliographic Retrieval Protocol is proposed coupled with bibliographic formatting 'plug-ins' for standard word processing packages. The final subsection discusses the role of the USIN Global Database and USINs generally in literature research.

5.1.6 UIP: USIN Inquiry Protocol

One of the primary motivations underlying the USIN concept is to address the 'broken links' problem on the Web: citation of works by URL is prone to failure when the cited item is moved or removed. To solve this problem, it has long been suggested that names of resources rather than their locations should be the basis of citation, but none of the proposals for URNs has yet succeeded. A more successful approach may be to concentrate on an important subset of the general problem: links to serially-published documents. For this subset, consider the direct use of USINs as permanent, 'unbreakable' links and the development of USIN Inquiry Protocol (UIP) to enable this use. For example, a hypertext reference to a sample TOPLAS article could be coded using the following HTML markup:

<A HREF="uip:S.ACM/TOPLAS:16@1811">A Behavioral Notion of Subtyping</A>

Note that a hyperlink formed in this way makes no reference to any particular computer system. Thus, the requirements of URNs are satisfied; the target of a link is designated by naming what it is instead of where it is located.

Apart from this use in Web-based documents, UIP also supports direct inquiries about a particular USIN. All the scholar need do is type uip:S.ACM/TOPLAS:16@1811 directly into the 'location' field of a Web browser (assuming that the browser has been updated to include the UIP client-side software.)

Ignoring for the moment how it works, the critical issue from a user perspective is what you get when you make a UIP/USIN inquiry, either directly or by activating a hyperlink. One answer is that you retrieve a 'metadata' page, that is, an information page about a document, but not the document itself. In general, direct retrieval of documents cannot be guaranteed because many may not be electronically available. On the other hand, if a document is available online, it may be available from a variety of different sources with a variety of different formats and/or pricing structures. The purpose of a metadata page, then, is to provide a full bibliographic description of the article or other item denoted by the target USIN, and a set of links for making further inquiries about the article and/or retrieving a copy of it.

Consider an ambitious design goal for metadata pages: to provide a comprehensive information resource with respect to the cited items. In addition to basic bibliographic information and links for acquiring copies of articles, a number of other items could be provided. Each article metadata page could include direct links to information about the serial and its publisher. Using the USIN notation it should also be easy to include links for retrieval of contents pages for sibling articles in the same journal issue or volume. Links for exploring other publications by the authors of the article might be included. In particular, links for locating subsequently published corrigenda would be worth highlighting. Information on review articles that discuss the document of interest may be included. In conjunction with a citation database, links for retrieving the sets of articles that are respectively cited by and cite this article could also be considered. Finally, it may be reasonable to consider including links to search services that can locate similar articles by full-text searching using a document surrogate (keywords and other metadata that describe the current document).

It may be the case that the coded USIN in a UIP hyperreference does not refer to a single article, but instead denotes some other serial component or is ambiguous or erroneous. In each of these cases, the page returned through UIP should also strive to provide comprehensive information to the user. For example, in the case of an USIN reference by page number where more than two articles start on the specified page, a menu showing each possible article could be returned together with their correct canonical USINs.

These ambitious goals for the metadata pages returned by UIP servers need not represent an obstacle to server development. The initial implementations of UIP servers may focus on basic capabilities, allowing additional functionality to be added over time. In addition, many of the capabilities could be implemented in a fairly modular fashion. For example, if a particular document delivery service supports Web-based document ordering by USIN, then generating the appropriate document ordering link is a simple matter.

Returning to the issue of how UIP may be implemented, note that the syntax for UIP/USIN citations does not specify the actual server to be consulted in resolving the UIP request. Rather, it is reasonable to expect that the server would be specified by an appropriate client-side mechanism, such as a UIPSERVER browser parameter or environment variable. Typically, users might choose to set their UIPSERVER to specify a server operated by a major local research library or library consortium. In this way, the metadata pages returned can be formatted to emphasize local holdings of cited documents, even when the citing document is remotely located.

5.1.7 Bibliographic retrieval and formatting

A key goal of the USIN scheme is to support authors of scholarly works in the preparation of bibliographic references. This may be achieved by bibliographic processing plug-ins or add-ons to standard word processing software that will allow authors to cite works by merely entering USINs at the appropriate citation points. The bibliographic processing modules could then take care of all the remaining details for resolving and formatting the citations:

  • retrieving the actual full bibliographic citations
  • assigning appropriate in-text reference numbers or labels
  • formatting the citations according to a chosen style guideline
  • sorting them according to a user- or style-specified ordering
  • incorporating the citations into the document as a reference list at the back or sequentially in footnotes.

As well as removing a considerable source of tedium in the preparation of scholarly works, the use of USINs in this way should also improve the accuracy and quality of citations by eliminating manual errors and inconsistencies. Finally, a serendipitous benefit of having the citations in a paper represented as USINs is that the citation set can then be made available as data; citation databases can thus be supported by citation data provision at the source (Cameron 1997).

A modular design for a USIN-based bibliographic processing system is to allow many different bibliographic formatting tools to retrieve data from the USIN Global Database using a common retrieval protocol (say BRP: Bibliographic Retrieval Protocol) and citation representation format (say BDF: Bibliographic Data Format). This would allow the development of competing bibliographic formatting tools that might cater to different user preferences and to different types of document processing system. BRP could be designed to work with locally-mounted copies of the USIN database for access to the bulk of historic bibliographic data, coupled with direct Internet access to the USIN Global Database for access to the latest references. BDF should provide a highly-structured logical format for citation data, to allow various transformations on that data to be easily implemented. Ideally, UPP (USIN Publication Protocol) and BDF should be designed together so that the bibliographic data in the correct format is gathered directly during the USIN registration process.

5.1.8 USINs, the USIN global database and literature research

To support bibliographic inquiry, retrieval and formatting, the USIN Global Database is designed to provide a comprehensive solution when starting with a set of citations represented as USINs. Consider also the literature research task, that is, the need to find citations of potential interest using various search methods. In this case, USINs are not known in advance, but may represent the results of the search process. In support of literature research, then, what role should USINs in general, and the USIN Global Database in particular, play?

One possible approach is to expand the requirements for the USIN Global Database to provide comprehensive support for literature research activities. After all, the USIN Global Database is intended to be comprehensive in its coverage of the citeable works and must provide the basic bibliographic data (author, title, serial name, serial enumeration, publication date) for each archived item. With the extension of the database to include abstracts, keywords and classification data for each item, it is possible to contemplate comprehensive support for literature research.

An alternative approach, however, is to support multiple alternative literature databases each of which provide their own methods of augmenting the basic bibliographic data available from the USIN Global Database. USINs themselves could form the basis of interoperability between the databases, i.e. distinct results from different databases could easily be combined by USIN sorting and matching operations. Such an approach would support:

  • different classification schemes that might be appropriate in different subject areas,
  • competition between different full-text searching techniques based on article abstracts and/or article full text,
  • selective databases that target sources relevant to a particular topic or type of material,
  • experimentation with filtering schemes that grade the level or nature of materials,
  • alternative language databases that support searching in languages other than English, and so on.

From the standpoint of good modular system design, it can also be argued that the USIN Global Database should deal only with the basic bibliographic data that derive from the publication process. Classification, evaluation and review materials should be considered third-party metadata that may come from a variety of sources. Without any agreed method for standardizing what types of metadata should be provided and who should provide it, it would be a poor choice to impose de facto standardization by incorporating a particular third-party metadata scheme into the USIN Global Database.

Nevertheless, it is reasonable to consider a limited extension of the USIN Global Database to support one additional form of metadata: citation metadata. A requirement of UPP could be that the USINs of cited references be supplied as part of the publication process. If, as suggested previously, scholars use USINs in writing their documents, it should not be difficult to provide them in the publication process. If this were done, it could support the development of a universal citation database that would in turn be a valuable tool for literature research and a potential catalyst for reform in scholarly communication (Cameron 1997).

6 Conclusion

The USIN scheme is proposed for the global and persistent identification of publications in organized serial collections. Ultimately some global identification scheme is likely to be developed for interoperation of various article citation applications. Scholars should seize the opportunity that now exists to ensure that the scheme that succeeds is the one that is designed primarily to meet the long-term needs of people (authors and readers), not the short-term needs of particular present-day computer systems belonging to vendors, libraries or document delivery services.

This paper has presented:

  • a vision for a scholar-friendly universal identification system for serially published works
  • a number of concrete design proposals for USIN syntax and technological components that can support a global USIN system
  • a uniform naming model based on hierarchical naming of serial publications and hierarchical numbering of serial items.
Two important systems in support of the USIN concept have been proposed:
  • a USIN Global Registry
  • a USIN Global Database

Designs for each of these systems have been presented at a level that illustrates how specific architectural features can interact to meet the requirements of publishers, librarians and scholars.

There is a great deal more work required to fully realize the USIN concept. The author would be most appreciative of your help.

Acknowledgements

Andrew Walenstein has helped greatly by providing valuable feedback on several drafts of this paper. Jim Cole, while still questioning some issues from a serials cataloguing perspective, has been a source of considerable encouragement. I am also grateful to the anonymous referees for many constructive criticisms and helpful suggestions.

References

American National Standards Committee on Library and Information Sciences and Related Publishing Practices, Z39, Subcommittee E: Serials Holding Statements (1986) American National Standard for Information Sciences - Serial Holdings Statements. ANSI Z39.44-1986, approved August 14, 1985 (American National Standards Institute: New York)
Suggested initial USIN: ISSN.8756-0860/Z39.44-1986. Possible eventual form US.ANSI/ANS:Z39.44-1986.
Anon. (1997) Publisher Item Identifier as a means of document identification, updated October 9 http://www.elsevier.nl/inca/homepage/about/pii/
Archived publication unknown. With no other formal denotation known for this work, it might only be denotable by reference to this paper. Possible eventual USIN: S.BCS/JoDI:1(3)$Cameron|ref(2). This assumes that BCS becomes assigned to the British Computer Society in the international domain of scholarly societies, and that JoDI is reserved by BCS to to denote the Journal of Digital Information.
Berners-Lee, T (1994) "Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web". RFC 1630, RFC Editor, Internet Society, June http://ds.internic.net/rfc/rfc1630.txt
Suggested initial USIN: RDNS."isoc.org"/RFC:1630. Possible eventual form I.ISOC/RFC:1630
Berners-Lee, T, Masinter, L, and McCahill, M (eds.) (1994) "Uniform Resource Locators". RFC 1738, RFC Editor, Internet Society, December http://ds.internic.net/rfc/rfc1738.txt
Suggested initial USIN: RDNS."isoc.org"/RFC:1738. Possible eventual form I.ISOC/RFC:1738, where ISOC might uniquely denote the Internet Society in a domain I of International organizations. Here, RFCs are identified in the domain for the Internet Society, the principal sponsor of the series. Technically, the "RFC Editor", chartered by the Internet Society, is said to be the publisher. However, it seems clear enough that RFC will remain an unambigous code for this series in the context of Internet Society sponsored publications.
Cameron, R D (1994) "To Link or To Copy? Four Principles for Materials Acquisition in Internet Electronic Libraries". Technical Report TR 94-08, School of Computing Science, Simon Fraser University, December http://elib.cs.sfu.ca/project/papers/e-lib-links.html
Suggested initial USIN: RDNS."sfu.ca".CMPT/TR:94-08. Possible eventual form CA.SFU.CMPT/TR:94-08.
Cameron, R D (1997) "A Universal Citation Database as a Catalyst for Reform in Scholarly Communication". First Monday 2(4), April http://www.firstmonday.dk/issues/issue2_4/cameron/index.html
Suggested initial USIN: RDNS/"firstmonday.dk":2(4)$4. Here, the article number ($4) is determined by counting. Eventually, the form P.Munksgaard/FirstMonday:2(4)$4 may be used, where Munksgaard is the code for Munksgaard International Publishers in an international publishers domain. Another possibility is J.FirstMonday:2(4)$4 based on the concept of a global journal domain J operated by a publisher consortium.
Coombs, J H, Renear, A H, and DeRose, S J (1987) "Markup Systems and the Future of Scholarly Text Processing". Communications of the ACM, 30(11), November, 933-947 http://www.sil.org/sgml/coombs.html
Suggested initial USINs: ISSN/0001-0782:30@933, RDNS."acm.org"/CACM:30@933. Possible eventual form S.ACM/CACM:30@933. An interesting point to note is that issue numbers are not required for CACM prior to volume 33.
Daniel, R (1997) "A Trivial Convention for using HTTP in URN Resolution". RFC 2169, RFC Editor, Internet Society, June http://ds.internic.net/rfc/rfc2169.txt
Suggested initial USIN: RDNS."isoc.org"/RFC:2169. Possible eventual form I.ISOC/RFC:2169.
Daniel, R, and Mealling. M (1997) "Resolution of Uniform Resource Identifiers using the Domain Name System". RFC 2168, RFC Editor, Internet Society, June http://ds.internic.net/rfc/rfc2168.txt
Suggested initial USIN: RDNS."isoc.org"/RFC:2168. Possible eventual form I.ISOC/RFC:2168.
Dershowitz, N, and Reingold, E M (1997) Calendrical Calculations (Cambridge University Press: Cambridge, UK)
Suggested USINs: ISBN/0-521-56413-1 and ISBN/0-521-56474-3. These codes use ISBNs for the hardback and paperback versions, respectively. Choosing the code for the hardback version as canonical may be appropriate.
DOI Foundation (1997) A Guide to Using Digital Object Identifiers, October 10 http://www.doi.org/guidebook/guidebook.html
Archived publication unknown. Possible eventual USIN: S.BCS/JoDI:1(3)$Cameron|ref(11).
Englebart, D C (1984) "Authorship Provisions in AUGMENT". Digest of Papers - Compcon Spring 84 - Twenty-Eighth IEEE Computer Society International Conference, San Francisco, February-March, pp. 465-472
Initial USINs: ISBN/0-8186-0525-1@465 (paper), ISBN/0-8186-4525-3@465 (microfiche), ISBN/0-8186-8525-5@465 (casebound). Possible eventual form I.IEEE/Compcon:28@465.
Fielding. R T (1994) "Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web". Computer Networks and ISDN Systems, 27(2), November, pp. 193-204 http://www.ics.uci.edu/WebSoft/MOMspider/
Suggested initial USIN: ISSN/0169-7552:27@193. Possible eventual form P.Elsevier/COMNET:27@193. Here, the code COMNET is used by Elsevier for this journal.
Green, B, and Bide, M (1997) Unique Identifiers: A Brief Introduction, Book Industry Communication, London, 1997 http://www.bic.org.uk/bic/uniquid
Possible eventual USIN: S.BCS/JoDI:1(3)$Cameron|ref(14).
Halasz, F, and Schwartz, M (1994) "The Dexter Hypertext Reference Model". Communications of the ACM, 37(2), February, 30-39 http://ds.internic.net/rfc/rfc2141.txt
Suggested initial USIN: ISSN/0001-0782:37(2)@30. Possible eventual form S.ACM/CACM:37(2)@30.
McNellis, C H (1996) "A Serial Pattern Scheme for a Value-Based Predictive Check-in System". Serials Review, Vol 22, No. 4, Winter, 1-11
Suggested initial USIN: ISSN/0098-7913:22(4)@1, RDNS."jaipress.com"/SR:22(4)@1. The code SR is speculative. Possible eventual form P.JAI/SR:22(4)@1.
Moats, R (1997) "URN Syntax". RFC 2141, RFC Editor, Internet Society, May http://ds.internic.net/rfc/rfc2141.txt
Suggested initial USIN: RDNS."isoc.org"/RFC:2141. Possible eventual form I.ISOC/RFC:2141.
Mockapetris, P (1987) "Domain Names: Concepts and Facilities". RFC 1034, RFC Editor, Internet Society, November http://ds.internic.net/rfc/rfc1034.txt
Suggested initial USIN: RDNS."isoc.org"/RFC:1034. Possible eventual form I.ISOC/RFC:1034.
National Information Standards Organization (1996) "Serial Item and Contribution Identifier (SICI)". An American National Standard Developed by the National Information Standards Organization: approved August 14, 1996 by the American National Standards Institute. National Information Standards series ANSI/NISO Z39.56-1996, version 2 (NISO Press: Bethesda, MD) http://sunsite.Berkeley.EDU/SICI/
This is an interesting case which is published in the National Information Standards series (ISSN 1041-5653) of NISO. It has also been given an ISBN. But the code Z39.56-1996 represents its numbering as an American National Standard. Suggested initial USIN: ISSN.1041-5653/Z39.56-1996. Possible eventual form US.ANSI/ANS:Z39.56-1996.
Nelson, T H (1987) Literary Machines, edition 87.1
Initial USIN: ISBN/0-89347-055-4.
Paskin, N (1997) "Information Identifiers". Learned Publishing, 10(2), April, 135-156 http://www.elsevier.com/inca/homepage/about/infoident/Menu.shtml
Suggested initial USIN: ISSN/0953-1513:10@135. Learned Publishing is published by the Association of Learned and Professional Society Publishers. On the path towards mnemonic identification, the USIN form RDNS."alpsp.org.uk"/LP:10@135 may temporarily be used before an international domain structure is in place. Eventually, the canonical form may become S.ALPSP/LP:10@135 based on a domain S of scholary societies.
Schwarz, F, and Hepfer, C (1996) "Changes to the Serial Item and Contribution Identifier and the Effects of Those on Publishers and Libraries". The Serials Librarian, 28(3/4), 367-70
Suggested initial USINs: ISSN/0361-526X:28@367 and RDNS."haworth.com"/SL:28@367. Possible eventual form P.Haworth/SL:28@367.
Sollins, K, and Masinter, L (1994) "Functional Requirements for Uniform Resource Names". RFC 1737, RFC Editor, Internet Society, December http://ds.internic.net/rfc/rfc1737.txt
Suggested initial USIN<: RDNS."isoc.org"/RFC:1737. Possible eventual form I.ISOC/RFC:1737.
Wheary, J, and Schutz, B F (1998) Living Reviews in Relativity: Making an Electronic Journal Live. The Journal of Electronic Publishing http://www.press.umich.edu:80/jep/03-01/LR.html
Suggested initial USIN: ISSN/1080-2711:3(1)$5. Possible eventual form EDU.UMICH.PRESS/JEP:3(1)$5.