XML Functionality for Digital Dictionaries: Muller and Beddow: JoDI

Moving into XML Functionality: The Combined Digital Dictionaries of Buddhism and East Asian Literary Terms

Charles Muller and Michael Beddow*
Toyo Gakuen University, Chiba, Japan
Email: acmuller@gol.com
*Leeds University, UK (retired)

A note on reading this text. Given the prevalence of Asian characters in this work, the authors recommend it is viewed with Netscape 6.2 or Mozilla 1.0, because these browsers support Unicode, and come equipped with all the necessary fonts. MSIE 6.0 and some earlier versions support Unicode, but the user needs to set up the Asian fonts: in the menu bar select View > Encoding > Unicode (UTF-8).


A report on the new developments in the online Digital Dictionary of Buddhism and CJK-English Dictionary, focusing on their implementation in XML.The paper is in two parts:
  1. Project Manager's Report, by Charles Muller
  2. Delivering CJK Dictionaries from Pure XML Sources: A Developer's Perspective, by Michael Beddow

1 Project Manager's Report, by Charles Muller

1.1 Technical Review

Compilation of the Digital Dictionary of Buddhism (DDB) began with the realization of the dearth of adequate lexicographical and other reference works in the English language for the textual scholar of East Asian Buddhism in particular, and East Asian philosophy and religion in general. The (Chinese, Japanese, Korean) CJK-English Dictionary (CJK-E) began soon after. I decided, during my first Buddhist and Confucian/Taoist texts readings courses, to save everything I looked up, and have continued that practice to the present, through the course of studying scores of classical texts. Although the content of these two lexicons is presently being supplemented by other interested parties, the terms that I have been compiling serve as the major portion of the work.

At the beginning I could not have dreamed of the Internet, or even thought of the possibility of having this material available as a digital database. I simply envisioned the eventual publication of a newer, larger and more useful printed work. As IT developments progressed, the potential gradually began to dawn. The first Web version was uploaded in the summer of 1995. It was not long after that Christian Wittern discovered the DDB and applied a basic SGML structure, which is the ancestor of the XML markup system used today.

Due to limitations of popular browsers, among otheh things, this framework languished for a couple of years, during which time new HTML versions of the dictionary were periodically regenerated with an array of Word macros. Even slight changes usually necessitated a complete re-tooling of the macro system. Also, access to the data in the dictionaries was limited to hyperlinking, through an array of index files also generated from Word macros.

The most important tool for making a dictionary really useful - a search engine, was lacking. What was needed was to keep the data in a stable, validated SGML/XML format, and presented to users by a style sheet, or some other database-retrieval technique. Once XML support had been included in MS IE5 for a year or so, I went back and experimented further, but found that many aspects of XSLT were still not adequately supported. Without Xlink/Xpointer support, not much could be done beyond the level of support provided by popular browsers.

The XML solution first appeared in the summer of 2000, when Christian Wittern developed an experimental version of the DDB using the Zope system. This was the first attempt to use the data in a form close to the original XML, and also the first time a search engine had been applied to either of the dictionaries. As the maintainer of the dictionaries, however, this system presented difficulties in the sense that the data needed to be converted into thousands of small files, which made the dictionary difficult for me to maintain locally. There was also the problem of a lack of full native support for XSLT. Nonetheless, Wittern's work marked the first time a version of the combined dictionaries had been generated more or less directly from the XML source.

A major turning point for the CJK-E/DDB came in January 2001. While browsing a Japanese magazine on Palm computing, I noticed that Jim Breen's Japanese dictionary was becoming a sort of standard for inclusion on DoCoMo portable telephones. It occurred to me that although my data had always been publicly available, since the time of finalization and validation of the XML format, no effort had been made to let IT people know these data were freely available to download, as Breen's were. The availability of the data files was announced on some major XML lists. Not long after, I was contacted by Michael Beddow.

1.2 Recent Evolution: XML Comes Alive

Michael Beddow, a scholar of German Studies, had a strong interest in using XML as a means of storage and delivery of literary and lexicographical documents. He was sure that he could add XSLT and XLink functionality to the latest versions of the standard browsers. Based on the markup structure of the CJK-E, he generated an array of indexes that used Xpointers to call single-entry data units from large files, each of which contained hundreds of entries. This was a landmark event for the project: up to this time, to call a single-entry sized unit the data files needed to be created to that size in advance, or HTML anchors could point to aa location in a larger file. With the new system the data could be plugged into a system as is, and function like a real digital dictionary.

It looked likely that the Perl-XLinking system could be applied to provide a search engine. Devising a search engine that can deal with mixed Western/CJK text in UTF-8 encoding had been difficult, as the software has trouble parsing the divisions in the character codes. A prototype CJK-Utf-8 search engine was developed.

The present number of terms included in the DDB (15,000 at the time of writing) is not small, but it represents only a tiny fraction of the terms, names, places, temples, schools, texts, etc., that are included in the entire East Asian Buddhist corpus. Thus, a search for a term conducted by someone whose research interests are significantly different to those of the compilers is likely to draw a blank. A group of scholars of East Asian Buddhism has been developing a comprehensive, composite index drawn from the indexes of dozens of major East Asian Buddhist reference works, which now includes almost 300,000 entries (described in further detail below). The search engine was extended to cover this comprehensive index. In its present state, the DDB may be searched for a term and if not found the search continues on this comprehensive index. (Michael's view of these events is described below.).

This section has focused on developments in the DDB, but the same enhancements have been applied to the CJK-E, except for the search through a comprehensive index. Some concrete examples of Web page format and search functions are given below, but first consider some content developments.

1.3 Content Development

1.3.1 Digital Dictionary of Buddhism

By January 1999 content included 4,200 entries. That number (at March 2002) has jumped to 15,000 and continues to increase rapidly. Grant support from the Japan Society for the Promotion of Science has enabled content to be built in a number of ways:

(1) Development of the comprehensive index (contents described in the Appendix): This project used the International Research Institute for Zen Buddhism (IRIZ) Zendicts.dat file as a starting point (containing around 56,000 entries). To this we at Toyo Gakuen University, in collaboration with teams at the Chung-Hwa Institute of Buddhist Studies and at IRIZ, added the indexes from a large number of major East Asian Buddhist reference works, bringing the total of entries to almost 300,000.

(2) Digitization of East Asian reference works: such lexicons as the Fo Kuang Shan dictionary and the Ding Fubao have already been formally and professionally digitized. We are adding to this by digitizing other valuable print works whose copyrights have expired, such as Soothill's Dictionary of Chinese Buddhist Terms, and works where we have permission to digitize from the copyright holder, such as Lancaster's Descriptive Catalog of the Korean Buddhist Canon. Students paid by our grants are scanning, OCRing, and correcting this data.

(3) Research Input from graduate student assistants: while the volume of these materials has not been especially great, this has been a good way to stimulate interest in the project. The students also benefitted from the chance to learn the computing techniques we are using for input, and to learn about XML.

(4) Automated input technology: based on a set of indexes and tables, most of the assistants are able to use our system of MS-Word macros to add new entries rapidly. The macros create a ready-made entry structure, along with suggested readings of the entries for Chinese, Korean and Japanese pronunciation. We are developing the necessary indexes to include Vietnamese as well. The system is limited to MS-Word, but since the indexes upon which the system is based are saved in Unicode text format, the development of an open platform input system which emulates our present Word system is feasible.

(5) Input from interested scholars: sizeable personal research glossaries have been received, and it is hoped that the continued increase in use of the DDB will encourage more scholars to contribute.

1.3.2 CJK-E Dictionary

These efforts have, for the past few years, been directed primarily at the development of the DDB, somewhat to the neglect of the CJK-E. Nonetheless, that collection now has almost 6000 compound words. Also, all 20,902 single character headwords in Unicode 2.0 have been made available for browsing, even though only about 8000 of these contain complete phonetic and semantic information.

1.4 XML Browsing Environment of the Combined Dictionaries

The home of the dictionaries (since February 2001) offers a choice between entering the DDB and the CJK-E. Upon entering the DDB table of contents page, the user is presented with the entire menu for the dictionary, including (1) the search engine and the various topic indexes; (2) the front matter and other explanatory materials for the dictionary, and (3) a small list of seminal resources for the study of classical East Asian Buddhist texts (Figure 1).


Figure 1. Table of contents page for the Digital Dictionary of Buddhism

By presenting the entire dictionary menu, plus the most important scholarly sites for those doing research in East Asian canonical texts, this page becomes a useful one-stop portal for specialists in our area. Also, all links to areas within the site use absolute, rather than relative, URLs. Thus, if you save this page to your desktop, you have ready access to all these materials via Internet connection.

Most serious researchers and translators are likely to use the search engine for basic access. For those who are not sure what they are looking for, or who do not have a Unicode supporting browser, or who simply want to browse, the indexes remain useful.

The search engine interface is shown in Figure 2. When activated, the search will yield a menu of matches, containing headword hits, and instances occurring in the explanatory body of other entries, as in Figure 3. Selecting, for instance, the headword match, the term in question can be browsed (Figure 4).


Figure 2. Search interface


Figure 3. Headword and text matches

figure 4

Figure 4. Headword retrieval

For the user, this all looks and feels pretty much the same as it did in the earlier HTML versions of the DDB, but what is happening is fundamentally different, as this HMTL text is being generated on the fly by Perl, XSLT, and Xlinking protocols.

The menu above provides standard links for returning to important places within the site, and also allows the user to view the XML source (Figure 5). This source view provides access to the names of those responsible for the various content areas.


Figure 5. XML Source Code Display

Those who have been watching the development of the DDB over time may notice the addition of a new field at the top of the <sense> area, called <trans>. This tag is borrowed from the Text Encoding Initiative (TEI), meaning "translation", but here it refers strictly to the word or short phrase as the direct common rendering that translators would use when rendering this term into English.

1.5 Inclusion of the Allindex Files

As mentioned above, one of the most important developments of the DDB is the integration of the comprehensive composite index of East Asian Buddhological reference works. When a user's search does not find the required term, the allindex files are searched, rendering a list of sources which might contain information on the searched term. For example, at the time of the writing, the term hamal 夏末 ("end of the summer retreat"), was not contained within the DDB. but a search gives the information in Figure 6. The Allindex project is discussed in the Appendix.


Figure 6. Alternative references in the Allindex files

As can be seen, we are finally reaching a point where many of the impediments to full implementation have been overcome. Most importantly, we are starting to be able to handle Unicode-encoded documents and take direct advantage of XML.

2 Delivering CJK Dictionaries from Pure XML Sources: A Developer's Perspective, by Michael Beddow

Probably the most important thing to stress about the collaboration between Charles Muller and myself on an XML-based delivery platform for the DDB and the CJK-E is that no more than six weeks elapsed between our first contact and the announcement of a fully-functional system (and indeed one that had more functions than either of us had envisaged at the start). Perhaps even more noteworthy is that I had the core of the system up and running (in so far as individual entries were being retrieved from the larger files) within a single day of first downloading the data.

I say this not to praise myself as a lightning-speed programmer, but to bring out what it is that makes XML such a hugely important force for changing the way we in the Humanities work with digital data. Years of effort had gone into Charles Muller's collection and markup of the data, and months of work had gone into my development of the modules from which I built a delivery platform tailored to those data; but because data marked up in XML really does describe its own structure, and because software that follows the recommendations for processors issued by the W3C is intrinsically adaptable to any sort of well-formed XML, none of our earlier independent work had to be redone to get the data online. When recoding his data in XML, Charles had been focusing on the scholarly content and the abstract structure, with relatively few detailed ideas about how it would eventually be delivered to users (who in the meantime continued to access his work via the conventional HTML site). For my part, I had been working on techniques of retrieving fragments of larger XML documents and rendering them into HTML on demand, with no substantial experience either of handling CJK data or of the problems specific to lexicographical applications. Yet, the required retrieval, delivery and rendering system more or less sprang into life of its own accord. "Self-describing data" pretty much engendered a self-creating delivery system. It was an exciting, if slightly uncanny, experience.

2.1 The Old and the New

One of the chief benefits Charles had foreseen when moving to XML encoding of his material was the possibility of using XLink and XPointer[1] technologies to allow users to retrieve selected fragments of larger documents. In an HTML implementation, either the editors have to maintain a very large number of small documents, with all the version management problems that entails, or users have to accept that the results of their queries are large documents of which only a small portion may be relevant to what they were looking for. Anticipating the removal of this serious limitation implicit in HTML, Charles had been marking up internal and external links in the XML version of his materials using a basic form of XLink/XPointer notation, but had believed that these links would only be used as intended once browser (and server) support for XLinking was widely implemented.

I was able to show that by using some simple cgi scripts in combination with server-side XSLT transformations, it is possible to implement a small but useful subset of the (still not finalised) XPointer and XLink proposals that can be used with present day browsers and servers. I originally developed these techniques, based on freely-available open source models, to allow the online publication of a long monograph of mine from a single canonical and easily-maintained XML file, while enabling users to request and receive portions of this single file as small as a single (printed) page, transformed on demand from the TEI-conformant XML into HTML.[2]

Like the HTML-based system that preceded it, the XML-based platform involves the creation of many thousands of files, largely because of the caching and indexation facilities it uses to speed retrieval and delivery. But there is an immensely significant difference from the editorial point of view. The thousands of HTML files had to be maintained by the editors themselves; in my system, all the editors need concern themselves with are the core XML files into which they enter their data. There are many other supporting files; but they are invisible to both the end user and the resource authors, and are generated and maintained transparently by the underlying system. The authors create and maintain XML files of whatever size best suits their methods of working, and whose structure is determined by their scholarly analysis of the material. The system validates, partitions and indexes those files, allows users to locate the items within them that they need, and renders the retrieved items into Web pages for delivery, creating hyperlinks for any internal or external cross-references as it does so.

About half way through my work on automatically creating the existing indices, Charles asked whether it would be feasible to create a free-text search engine that would supplement these indices as a means of access, and for some classes of user maybe even replace them. This free-text search engine is only half-way usable. Some of the problems lie in my own coding, which needs, and will in due course receive, much more work. Other problems stem from aspects of the underlying system libraries which only come to light when complex regular expressions involving utf-8 encoded characters from across the entire Unicode range are let loose on multilingual texts. There is also an irritating bug, alluded to by Muller in section 1, which occurs only on the (FreeBSD) hosting server but cannot be reproduced on my (Linux) development system, and which causes the initial failure of some search attempts. I hope users will not find it distractingly flippant that, as a much-needed caveat-cum-apology, I have cited on the query form the remark I suspect the father of Anglophone lexicography might have made about my efforts, had he encountered them betwixt his observations of women preachers and dogs walking on their hind legs. Given Dr Johnson's place in scholarship, this seemed more appropriate than the other citation which also springs to mind in my defence, G.K Chesterton's observation that "if a thing's worth doing, it's worth doing badly".

Aside from the search engine, the other thing the new system brings from a user's perspective is a more commodious display of the data. The layout, ordering and indeed the contents of the delivered HTML can easily be changed by editing a single controlling XSL style sheet, without touching the XML data, so it is easy to act on user comments (or editorial second thoughts) about the presentation of the material which previously might have required thousands of separate HTML pages be recreated. In other words, the separation of visual design from logical structure that XML allows for is here given full scope.

The nature of XML markup has also allowed a significant extension of what the user, specifically of the Digital Dictionary of Buddhism, can be offered. As the very large set of references to Buddhist CJK terms in printed or other digital dictionaries which Muller and his associates have assembled were also marked up in XML, the DDB's facilities could be greatly expanded with little programming effort. If a user looks up a term in the search engine which is not in the DDB (or if s/he follows a provisional cross-reference in the DDB where the reference target has not yet been edited into place), a secondary lookup is performed on the external references data. If the term concerned is found there, the user is offered a listing of the locations in those external sources where the term is defined or explained. Given the very large number of entries in this secondary data collection (c.300,000 and rising), lookups are assisted by a Berkeley db database (itself automatically built from the core XML) interposed between the client and the XML sources: this is the only instance in the current implementation where information is not located by a direct parse of the core XML files.

2.2 System in Operation

Each headword in the dictionary has a unique identifier as part of the markup. This ID is derived algorithmically from the name of the dictionary plus the Unicode numerical representation of the characters in the term. When a term is requested, either from one of the various user-accessible indices or as a result of a search engine query, the relevant ID is passed to a cgi script on the server. That script parses the appropriate xml file[3], locates the entry by its ID and extracts it, then passes the resulting XML fragment on to an XSLT processor[4], which converts it into HTML while building the necessary hyperlinks for any cross references the entry contains.

In practice, this process is complicated (but also accelerated from a user perspective) by a system of caching, by which both XML fragments and the corresponding HTML version, once created by an initial request, are stored so that future requests can be met without further parsing or transforming, until the editors alter the items concerned in the XML (which automatically invalidates any cached copies of the altered material), or alter the XSLT style sheet that controls presentation (upon which all cached HTML is marked invalid so that it will be regenerated with the changed presentation next time the XML is retrieved).

2.3 Platform Requirements

Though earlier work on XML fragment retrieval and rendering in real time was done on University servers which I specified and managed, giving me complete control of the hardware and software, the programs that deliver these dictionaries can run on servers which offer only the limited configuration facilities found at the inexpensive end of the commercial hosting market. No privileged access to the machine is needed to install or maintain them. There is, of course, a performance penalty: the whole thing would run faster and handle more simultaneous users without performance deterioration if it could be moved "in process" with the Web server, so that the script handling system did not have to be loaded and initialized for every single request, as happens at the moment.  But performance is broadly satisfactory for the present size of the datasets and should cope with their planned expansion. My modified methods mean that other scholars who would like to deploy a version of this system adapted to their particular data have the prospect of getting it to run without excessive dependence on the cooperation or expertise of their local server administrators. One indispensable requirement, for CJK applications at any rate, is the presence of up-to-date system libraries for handling Unicode, and experience suggests that these are more commonly found on commercial sector servers than on campus facilities.

2.4 Moral of this Tale

Charles Muller and I were, within a matter of days, able to pool our knowledge and interests and work together across nine time zones as effectively as if we had been in neighbouring offices. Humanities scholars who still insist that computers are no more than glorified but temperamental typewriters, and campus finance officers who believe only scientists need decent computer hardware or network connections, might like to consider revising their views. Anyone who thinks XML is either just a fad or tomorrow's technology can see its enabling power at work.



Of the many explanations of XLinks and XPointers available online, the one that to my mind strikes the best balance between comprehensibility and depth of coverage is at http://www.javacommerce.com/tutorial/xml/linking.html. The current W3C proposals (not easy reading) are
for XLink http://www.w3.org/TR/xlink/
for XPointer http://www.w3.org/TR/xptr.

[2] This work may be seen at http://www.mbeddow.net/foh/.

[3] The parser is expat, originally by James Clark, now maintained by Clark Cooper and Fred L Drake, Jr, available from http://sourceforge.net/projects/expat/.

[4] The processor used is Xalan C++ Version 1.1 from http://xml.apache.org/xalan-j/index.html.

Appendix: The Allindex Database

Composite Index of East Asian Buddhist Lexicographical Sources ("allindex.xml")

Primary Compilers: Urs APP, Christian WITTERN, Charles MULLER, Michel MOHR, HUR In-Sub

Initial Release Date: 4/26/99

Updated: 3/1/2001; Download

The "allindex" file is an ongoing compilation of the indexes of East Asian dictionaries of Buddhism. It was initiated in the form of the Zendics.dat file published on the ZenBase CD-ROM, by the International Research Institute for Zen Buddhism (IRIZ), developed by Urs App, Christian Wittern, and their staff. That file contained complete index information for the sources listed in the IRIZ bibliography below (58,563 entries). Using this file as a basis, we have been continuing to add indexes from other lexicons, among the most significant of which are the index to Nakamura Hajime's Bukkyōgo daijiten, the Fo Kuang Shan Dictionary, Ding Fubao, Hirakawa's Buddhist Chinese-Sanskrit Dictionary, the Bussho kaisetsu daijiten, the Oda and Mochizuki dictionaries, as well as many other smaller Buddhist dictionaries. This file will be further supplemented by the digitization of other East Asian Buddhist lexical resources, presently in progress.

  Each entry contains a field for the headword (Chinese characters), the Chinese, Korean, and Japanese readings of these characters, an ID number, and a list of sources for the word that have been identified.


<entry ID="b4e00">

  <hdwd> </hdwd>

  <pron lang="zh" system="py">yī</pron>

<pron lang="ko" system="hg"> </pron>

<pron lang="ja" system="hi"> いち</pron>


<dict name="ZGD">28a</dict>

<dict name="Ina-Z">75</dict>

<dict name="ZD">269</dict>

<dict name="Naka">45a</dict>

<dict name="FKS">1111</dict>

<dict name="BCS">0001</dict>

<dict name="YBhI"/>



Indexed works

1. From IRIZ

a. Urs APP and Christian WITTERN

Daitō shuppansha. 1979 (rev. edition 1994). Japanese-English Buddhist Dictionary 日英仏教辞典. Tokyo (page numbers and terms of both editions are included). [JE]

Genkyō Zenji 元恭禅師. 1908. Zengaku zokugokai. 禪學俗語解. Tokyo: Kaiunji 海雲寺(Republished in 1991 by the Zenbunka kenkyūjo as part of the Zengo jisho ruiju fu sakuin. 禪語辭書類聚 付索引. Kyoto: Zenbunka kenkyūjo 禅文化研究所).

Inagaki, Hisao 稲垣久雄. 1991. A Glossary of Zen Terms. Kyoto: Nagata Bunshōdō 永田文昌堂. [Ina]

Iriya Yoshitaka's 入矢義高, trans. Baso no goroku 馬祖の語録.Footnotes to Iriya's Japanese translation of the Mazu yulu: Kyoto: Zenbunka kenkyūjo 禅文化研究所, 1984.

Iriya, Yoshitaka 入矢義高, and Koga, Hidehiko 古賀英彦. 1991. Zengo jiten 禅語辞典. Kyoto: Shibunkaku 思文閣. [ZGo]

Komazawa daigaku nai Zengaku daijiten hensansho 駒澤大学内禅学大辞典編纂所. 1977. Zengaku daijiten 禪學大辭典. Tokyo: Taishūkan shoten. [ZDG]

Miura, Isshū, and Fuller Sasaki, Ruth. 1966. Zen Dust. Kyoto: The First Zen Institute of America in Japan (Out of print).

Mujaku Dōchū 無著道忠 . 1979. Kattō gosen 葛藤語箋. Kyoto: Chūbun shuppansha 中文出版社. (pp. 868-1100 of volume 9 of the Zengaku sōsho 禪學叢書edited by Yanagida Seizan 柳田聖山). Our index also contains the page numbers of another edition: Kattō gosen 葛藤語箋. Tokyo: Komazawa University's Compiling Office of the Zen Dictionary, 1959.

Mujaku Dōchū 無著道忠 . 1979. Zenrin shōkisen 禪林象器箋. Kyoto: Chūbun shuppansha 中文出版社. (Volume 9 上of the Zengaku sōsho 禪學叢書, edited by Yanagida Seizan 柳田聖山). Our index also contains the page numbers of another edition: Zenrin shōkisen. Kyoto: Seishin shobō 誠信書房, 1963.

Nakamura, Hajime et al. 中村元など . 1989. Iwanami Bukkyōjiten 日英仏教辞典 . Tokyo: Iwanami. [Iwa]

Shibayama Zenkei 柴山全慶 . 1972. Teihon zenrin kushū 定本禪林句集 . Kyoto: Kichūdō 其中堂 .

Yanagida Seizan 柳田聖山trans. Rinzairoku 臨済録. Footnotes to Yanagida's Japanese translation of the Linji lu. Tokyo: Daizō shuppansha, 1972.

Yokoi, Yūhō 横井雄峯 . 1991. The Japanese-English Zen Buddhist Dictionary 日英禪語辭典 . Tokyo: Sankibōō Buddhist Bookstore. [Yo]

Yuan Bin 袁賓 . Chanzong zhuzuo ciyu huishi. 禪宗著作詞語 . Shanghai: Jiangsu guji chubanshe 江蘇古籍出版社 , 1990.

Zen no goroku 禅の語録. The footnotes to all 17 vols of the series published by Chikuma shobō: Tokyo.

b. At IRIZ; ABE Rie, Urs APP and Michel MOHR

Mochizuki Shinkō. Bukkyō Daijiten. [MZ]

Oda Tokuno 織田得能 . Bukkyō Daijiten 佛教大辭典 . Tokyo: Daizō shuppan kabushiki kaisha, 1995. [Oda]

2. At Toyo Gakuen University; Charles Muller, et. al.

Nakamura Hajime 中村元 , ed. Bukkyōgo daijiten 佛教語大辭典 . Tokyo: Tokyo shoseki, 1985. - {digitized by Charles Muller and Maki Miyaji} [Naka]

Dongguk University Research Center for Buddhist Culture, ed. Kankoku bukkyō kaidai jiten 韓國佛解題辭典. Tokyo: Kokusho kangyōkai, 1982. - [KBKJ] {digitized by Charles Muller}

Ono Gemmyō 小野玄妙, ed. Bussho kaisetsu daijiten 佛書解説大辭典. Tokyo: Daitō shuppansha, 1999. {digitized by Charles Muller and Maki Miyaji} [bsk] [index=bski]

Saito Akitoshi 齋藤昭俊and Naruse Yoshinori 成瀬良徳, ed. Nihon bukkyō jinmei jiten 二本仏教人名辞典. Tokyo: Shinjinbutsu ōraisha, 1993. [NBJJ] {digitized by Charles Muller and Megumi Katahira}

Yi Chŏng 李政. Hanguk pulgyo inmyŏng sajŏn 韓國佛教人名辞典(The Korean Buddhist Biographical Dictionary). Seoul: Pulgyo sidaesa, 1993. {digitized by Charles Muller and Asa Suzuki} [HPIS]

3. At the Chung-Hwa Institute for Buddhist Studies; Christian Wittern, et. al.

Bukkyō kanbon daijiten 佛教漢梵大辭典 (Buddhist Chinese-Sanskrit Dictionary). Hirakawa Akira 平川彰. Tokyo: The Reiyukai, 1997. [BCS]

Ding Fubao(Electronic version) - [DFB]

Fo Kuang Shan Dictionary- [FKS]

4. At the Research Institute of the Tripitaka Koreana; In-Sub Hur, et. al.

Korean readings for over 100,000 entries.

At the time of the most recent update, this compilation contained approximately 290,000 terms, and is now included within the full-text search function of the Digital Dictionary of Buddhism. If you have a CJK Buddhist lexical index that you would like to add, please contact Charles Muller at <acmuller@gol.com>. You can download this data collection.

Wherever possible, missing characters are encoded with Mojikyō numbers (&Mxxxxxx;). Kanjibase (&Cx-xxxx;) numbers that remain in the index have not yet been matched with their Mojikyō equivalent. Characters not yet contained in Mojikyō are encoded in algebraic format.