Go to DOCLE Home Page

Extract from the paper presented by the author at the Asia Pacific Association Medical Informatics and the Health Informatics Society of Australia (APAMI - HIC) 97 Joint Conference at Darling Harbour, Sydney, Australia on August 11-13, 1997


Docle - the coding scheme which comes with a free medical belief system

Any medical decision support system, from the most rudimentary to an encompassing system, requires a belief system comprising of a basic set of assumptions. Based on this premise, one cannot achieve a useful global decision support system encompassing the fields of medical history taking, physical examination, investigations, diagnoses, surgical treatment and drug therapy unless a general medical belief system is constructed that spans across all the domains mentioned. This presentation describes a coding, classification and general medical belief system that spans and embraces two separately defined fields of endeavour in medical informatics today. On one end of the spectrum, there is the field of endeavour relating to medical coding and standardisation of terminology. To the other end of the spectrum, the field of endeavour is the construction of medical decision support systems utilising IF-THEN rules or similar artificial intelligence (AI) techniques. The schism between coding and knowledge representation has resulted in a situation akin to the architect using drawings to represent his designs while the builder is using a text description to build the house.

The alphabetic Docle system unifies coding, classification and knowledge representation. Docle is different in that the coding part contains the mere keys to the medical belief system. Medical entities are classified as species and placed in a Linnean hierarchy much like a species such as Homo Sapiens. For instance, the liver object at the level called ORDER, knows all its associated diseases, symptoms and signs. So what is the big deal? A coding cum classification cum medical belief system appears to be an efficient approach for the construction of decision support computer programs, and may well save years of hack work for the implementor.

The coding system for medical data is the “glue” that makes health informatics happens; without this proper “glue”, projects will fall apart. Hitherto numeric coding systems such as International Classification of Primary Care (ICPC) and International Classification of Diseases (ICD) are designed for epidemiologists and statisticians. The next wave in medical informatics is to encode medical data for day to day patient care, clinical decision support, transportable medical records and intelligent medical systems. These next wave projects have severely strained the old numeric coding paradigm. Next wave coding schemes will most likely be alphabetic and will incorporate attributes of medical belief systems. A code for a symptom will be linked to associated organs. The Docle coding system is used by more than 2000 medical practitioners in Australia, making it the most used coding system in general practice in Australia today. The classification system comprises the following phyla or chapters: disease diagnoses, symptoms, signs, reasons for encounter, diagnostic imaging, diagnostic non-imaging, treatment procedures, and therapeutics. The Docle coding and classification system has been designed to solve the following problems:

a coding system in medical informatics

a method to achieve standardization of medical abbreviations

a medical belief system that parallels the Linnean model in biology suitable for the organization of medical knowledge

a medical belief system suitable for the design and implementation of sophisticated medical decision support systems.

Docle system has drawn the two strands of biology and medicine together in that it follows the Linnean model of classification.

What is wrong with today's numeric coding system? Six to start with.

1) Variable for victory
One of the biggest differences between Docle and the prior art of Read, ICD, SNOMED and ICPC is that Docle uses the computer variable concept while the rest of the field are implemented like computer constants. A variable is like a container. This container in Docle, called a Docle object, can be accessed via primary, secondary and tertiary keys. The three types of keys are equivalent in the sense that they all point to the same container with its stored methods and data. Inside the container is the ‘belief system’ about the object. While the name of the variable is fixed, the contents of the variable may vary over time. The variable may contain a pointer to another variable and ad infinitum. The variable defined may be a huge data structure which may hold assorted variables and constants. Such a dynamic design gives maximum flexibility to cope with changes. As opposed to this framework, one can use the old method of say item 222 maps to diabetes mellitus or a slight modification such as LK222 where L implies it is a symptom and K implies cardiovascular. Any classification based on symbolic constants will suffer the ravages of atherosclerosis. Docle makes use of the concept of separation of the belief system data from the key code itself. This deferment of data binding to the code key provides Docle with unparallelled flexibility to expand and mutate with the growth of medical knowledge. The key, be it primary, secondary or tertiary (see later), leads to the same medical object with its stored behaviour. Medical advance will lead to gradual adjustments to the behaviour of the medical object. It is hard to envisage the need to change species names such as rheumatoidArthritis or diabetesMellitus.

2) Number code is not a viable belief system
The Linnean system in biology is a viable belief system that is alive, moving on with the advancement of biological knowledge. It is a framework or road map to the realm of biology. The gaps in the Linnean framework excites the imagination of the biologist about missing links in their knowledge. It is a powerful method of cognating the knowledge that is being accumulated. This yearning for classifying and cognating medical knowledge was expressed in the preface of the ICD 9 manual, but it was just a yearning. Docle attempts to satisfy this yearning by tying the two strands of biology and medicine together. The Docle classification system is modelled on the Linnean system; entities are described as medical species and medical genuses. Lists of numbers mapped to diseases are suitable for statistical analysis but will not excite the imagination of the medical researcher.

3) Granularity problem and the Genus chunking solution.
The granularity problem is familiar with anyone attempting to write a decision support program in medicine. An instance of this problem is the flagging of the disease/drug interaction between the beta-blockers and diabetes mellitus. It would be tedious, inefficient and prone to error to try to pick up every specific type of beta-blocker interacting with every variation of diabetes mellitus. An example of the beneficial effects of chunking into genus level is the case of diabetes mellitus. Chunking up of the three variants of diabetesMellitus, diabetesMellitus@gestation, diabetesMellitus@insulinIndependentDiabetesMellitus, and diabetesMellitus@nonInsulinDependentDiabetesMellitus, into a genus called diabetesMellitus allows the common behaviour to be stored in the diabetesMellitus genus object. Likewise one can chunk up the therapeutic species of propanolol, atenolol and metoprolol into the medical genus betaBlocker. An adverse drug-disease interaction is flagged when the two genuses of betaBlocker and diabetesMellitus are combined. A new beta-blocker will inherit this interaction behaviour as soon as it is tagged as belonging to the genus of betaBlocker in its container holding its belief system.

4) Why choke on number codes when Docle is a feast in verse?
Up till now, SNOMED and all the predominantly numeric coding systems based on the multiaxial concept had looked promising. With SNOMED and all its derivatives, as much information as possible is loaded into the code. An example is SNOMED coding for pulmonary tuberculosis with granuloma being T-2800 M-44060 E-2001 F-03003. Another example of the relentless drive to pack as much information into its code as possible is the Read code. The code G3011 stands for acute anteroseptal myocardial infarction; the G denotes the cardiovascular axis, the 3 denotes ischaemic heart disease and the 0 denotes acute. The modern numeric codes such as Read and ICPC have been heavily influenced by the SNOMED technique of cramming as much information as possible in its code by using the multiaxial technique. In practice such a scheme goes awry as a condition could be both cardiovascular and infective, such as viral myocarditis. Such a scheme leads to a fixation on the codes rather than concentrating attention on the evaluation of the state of knowledge of the disease entities. It is like - there is a space here in our number scheme, let us see if we can fit in any more entities. By then adopting the classification in vogue for a subspecialty, it is locked in a concrete code format. It may be effective for several years but leads to incongruities in the future as technology advances. With such a system, five years down the track the wish would be an extra axis to cater for the explosion in knowledge about the genome.

Another incongruity detected in the ICD9 coding system used in hospitals is the duplication of codes. Tuberculous meningitis can be viewed from the tuberculosis angle or the meningitis angle, hence there are two different numeric codes 013.0 and 320.4 describing the one and same entity. Such an incongruent event would not occur in Docle as the code/key for tuberculous meningitis leads to an object with inherited behaviour of tuberculosis and meningitis. The opportunity to save programming time in decision support construction is obvious. Instead of enumerating all the symptoms, signs and laboratory findings of tuberculous meningitis - one can cognate the belief systems of the tuberculosis and the meningitis objects, then overlay with the belief system data and methods unique only to tuberculosis meningitis. Docle is intuitive and suitable for a unified medical abbreviation standard, for example carc.thyr means carcinoma located at thyroid. Currently there are 20,000 terms in the Docle dictionary. There are still 40,000 terms that are still unrecognisable in a feedback among the 100 respondents in the 2000 user group; these are mainly tertiary type keys which will be incorporated. Compared to the constant tinkering with the structure of numeric systems, a word-based system is comparatively stable and easy to maintain. The stability is derived from the fact that the Docle term is a direct transcription of an entity that is real. For example, the code diabm is the computer-generated key derived from the primary key diabetes mellitus;there is no need to change the code. However the behaviour of the diabm object may need tinkering with the evolution of knowledge.

5)Code shear technology.
Docle is built up of words joined by operators, much like an internet address. Coded entities are modified by aspects such as laterality, acute, chronic, simple, compound, complicated and male or female. The modifiers are added to the main code by the clicking of buttons. The & character is the shear operator. As an example, for the code fracture.femur&rightHandSide@simple, during processing the substring &rightHandSide can be sheared off to return the basic code fracture.femur .

6) Best Practice by stealth
Evidence-based medicine, world best practice - that is the catchcry of the modern day practice. For the sake of efficiency and wanting to conform to the world’s best practice, under Docle, best practice can be encoded inside the Docle object whereby each disease Docle object can have a list of ranked recommended treatments and a list of ranked investigations. And all this rapidly changing knowledge is being updated on the clinician’s desktop every three months. Adoption of a Docle-type coding system will achieve Cochrane-type objectives by stealth.

Overview of the Docle Linnean Classification System

The metamorphosis of Docle from a nomenclature to a classification system has been a slow evolution. The problem had been the mistaken belief that diseases were well classified by the ICD group under the World Health Organisation (WHO) auspices. While subspecialties have done neat jobs in their domains, there is no framework to tie things together. Classification was initially deemed outside the problem domain that Docle was designed to solve. Yet medical informatics has hit the wall with the lack of an efficient coding system, or as is generally thought. Is it a coding problem that we have? No one talks of assigning number codes to species of plants or animals. There does not appear to be a coding problem in biology.

Then came the overwhelming realisation, that maybe it is not merely a coding problem. The "medical coding problem" has not been properly defined. The problem is more profound than that. The problem is that a mature classification of diseases and related medical entities does not exist; not one exists that is as well developed and as disciplined as the biological classification system. We have failed with not coming up with a set of species names for disease entities. We have failed to define the concept of a medical species. There are no medical equivalents for the phylum, class, order, family and genus. There is no equivalent binomial nomenclature in Latin for rheumatoid arthritis or myocardial infarction. So instead of insisting on a genuine standard, like the Latin binomial nomenclature for biologists, the medical fraternity has sold out for several long lists of number codes. These purport to cover a complete list of diseases, whose links to reality become tenuous with the passage of time. The lack of emphasis on species identification, and the attendant lack of standardisation of species names is of course due to the absence of a congruous framework. The rapid development of medical science and the critical lack of a decent classification framework for the medical field has resulted in a fragmented state of affairs. We have islands of information coded variously in SNOMED, ICD, Casemix/Diagnostic Related Groups (DRGs) and ICPC. Unified Medical Language System (UMLS) identified the problem, but instead became subsumed by it.

Instead of coming up with more sets of numbers linked to medical entities, the challenge is to create an equivalent Linnean system. The proposed Docle framework is a classification system for the medical domain based on the Linnean model. We are not yet proposing Latin binomial nomenclature. It is unlikely the medical community will accept the wholesale renaming of medical terms in Latin, not that it works. Classification by the direct transposition of the medical domain into the biological model of course does not work as the framework is not fully compatible. For that to happen, there are three prerequisites. Firstly we need the equivalent of the binomial nomenclature. This nomenclature must be powerful and a standard way of describing medical entities. Secondly, we need to completely rework the Linnean hierarchical levels and introduce new definitions for the various levels. Thirdly, we need to create new rules for the classification process. Instead of Latin names, we have a structured medical descriptive language called Docle. In the majority of cases, Docle names are names of medical entities that are straight out of the medical textbooks. Occasionally they may look like someone's internet addresses. These peculiar names are Docle expressions, first presented at the 1987 RACGP Computer Conference in Melbourne and subsequently at the APAMI94, HIC95, HIC96 conferences. The epiphany for Docle happened in 1995 when it metamorphosed into a Linnean framework.

The direction that Docle has taken is to use the concept of the species name as the KEY to a medical object, also called a Docle object. Hence Docle is a classification of medical objects. This classification of medical objects is also called Objects Medica. The medical object holds information that refers to memberships of taxa, pointers to species in lower levels of hierarchy and its own level of hierarchy. That way, as medical science progresses, the medical object is updated but the key remains stable. As there is no need to assign numbers to entities that are not numbers, species names are alphabetic. The problem is therefore straightforward as detailed below:

identify all the species (or subspecies thereof) of medical objects

assign to each species named an object which is a data repository regarding its memberships of taxa and other information

classify them into a logical framework satisfying the requirements of all manner and types of health workers - this is important as previous coding systems were designed for the medical statisticians and certainly not for the information scientist who is developing applications for decision support in a clinical setting and for paperless medical records.

Docle Framework

Some ideas have been borrowed from the impressive edifice of biological classification started by Linnaeus in the 1750s. One of the central tenets of biological classification is the concept of the species; the other tenets are the hierarchies and the concept of the taxon (plural taxa). A taxon is a group with shared values in each hierarchy. Species identification is half the work, while the other half involves placing the species in the right taxon in the right hierarchy.

The system of classification in Docle is based on the above framework with major modifications. It would be fair to say that Docle is the offspring of Linnean classification and the object oriented language Smalltalk. Whilst the concepts discussed were first implemented in a computer system in Smalltalk, there is no problem whatsoever for Docle to be a manual system or written up in any standard database or high level computer language.

The main deviations from the Linnean model are:

there are more hierarchies defined below the species level

a species or any of its subclasses can have membership in any number of taxa at any level - this is the multiple inheritance feature of Docle

the corollary of the above is that a species may have no membership of any taxon at any level

as implemented in Docle, a taxon knows its membership; that is, a species knows who its subspecies, subsubspecies, subsubsubspecies and subsubsubsubspecies are, if there is any

the taxa at the next level down of the hierarchy does not need to be descendants of a taxon at the current level

the entity to be classified is held in a Docle object (also referred to as medical object); the name of the object becomes the key to the object. There are three types of key to these Docle objects. The primary key is the complete key that can look like a textbook name or an expression that looks like an internet address. Example of a primary key is diabetesMellitus. Note the absence of a space between diabetes and mellitus. The secondary key is computer generated and is an abbreviated version of the first using the Docle algorithm. Hence the secondary key is diabm. The abbreviation, which is the secondary key, is useful in that doctors like to communicate in a shorthand manner whenever possible. It is also a subtle method to get doctors to standardise on abbreviations. The tertiary keys are the nominated aliases of the entity. To summarise, in the case of diabetes mellitus, the primary key is diabetesMellitus, the secondary key is diabm and the tertiary keys are the aliases diabetes and sugar diabetes.

the American spelling has been elected - moans may be heard from the Commonwealth camps! Docle is about simplification. In many cases, a character saved, for example hemoglobin instead of haemoglobin, could be used for another aspect of the code.

hierarchies in Docle

Kingdom - there is only one taxon located at this hierarchy. It is named Objects Medica. Objects Medica holds all medical objects and all objects of medical thought.

Phylum - the taxa are:

Medical Administration

Symptoms Signs

Diagnostic Non Imaging

Diagnostic Imaging

Procedures Process Of Care


Thinking About Medical Thinking And Practice (TAMTAP)

Reason for encounter

Clinical Domains

Class - the taxa are the various clinical fields in medicine. The groups are Adolescent Health, Blood, Cardiovascular etc.

Subclass - is reserved for the exciting frontier of genetic medicine. With the complete mapping of the human genome, gene locations/regions can be linked to specific medical syndromes. For example the HLA class II genes is linked to IDDM. We can use taxa such as X-linked or Y-linked. We await a uniform nomenclature for gene maps.

Order - the taxa are named anatomical locations and organs.

Family - the taxa here are for the biochemical and physiological bases of disease.This can be at the molecular and cellular level. Examples of the groups here are:

disorders of lipid metabolism

disorders of the prostaglandins

disorders of nitrous oxide metabolism

disorders of heat regulation

disorders of the mucopolysaccharides

disorders of cell membrane transport. The subfamily hierarchy is reserved for taxa related to DRGs and Casemix.

Genus - a taxon at this level is a larger concept and holds from 11 to 200 species. Examples of taxa are:

valvular heart disease



benign neoplasm

malignant neoplasm

intermediate neoplasm.

Superspecies - a taxon at this level is a concept that holds 2 to 10 species. An example of a superspecies is fracture of the femur. It is not specific enough for treatment and prognostication, it contains several species.

Species - the root word is the Latin specere which means to look at. At the species level the medical entity is real and can be looked at or experienced by the patient/clinician. A species belonging to the phylum Clinical Domains is a characteristic syndrome with clinical features generally known. Often there is knowledge about its aetiology. There is knowledge regarding diagnosis by clinical and/or non-clinical methods. There exists in many cases a knowledge of its natural history. There is associated a system of management of this syndrome and in many cases methods of prevention. A diagnosis at the species level or better is required for specific therapy. Examples of a species are diabetesMellitus, fracture.femur@neck and acidosis@metabolic. Species belonging to phyla other than Clinical Domain are non-abstract entities such as cough, chest X ray, or a swelling located at neck.

Subspecies - a differentiated type arising from species, it suggests more specific treatment and prognostication.

Subsubspecies - a more differentiated type arising from subspecies.

Subsubsubspecies - a differentiated type arising from subsubspecies.

Subsubsubsubspecies - a differentiated type arising from subsubsubspecies.


case study - classification of fractures involving the neck of femur.

The primary key is followed by its secondary key. The production of these secondary keys is automated by the use of the Docle algorithm.


object medica


clinical domain






disorder of bone metabolism


fracture.femur - frac.femu, fracture - frac


fracture.femur@neck - frac.femu@neck 


fracture.femur@neck@pertrochanteric - frac.femu@neck@pert


fracture.femur@neck@pertrochanteric@avulsion - frac.femu@neck@pert@avul 

A Docle-type solution to medical coding is inevitable if we use the evolution of computer languages as a paradigm. Computer programming moved from an instruction set of ones and zeroes to assembly language to high level languages and 4GL. Likewise medical coding will jump from mainly numeric to alphabetic expressions. Docle is human readable and is more suited to input validation. For instance the dot operator means "located at"; validation routines will make sure that the referred site is a valid anatomical one. Docle is human readable, hence it is more suited for mission critical tasks because the nurses and doctors can visually vet for the correctness of computer data. Technology cycles are fuelled by marginal increases in utility. The Docle codes are both human and machine readable; these codes are actually embedded in the case notes in the encounter form as implemented in the Event Oriented Medical Record (EOMR) system. The explosion of medical knowledge in the past twenty years has been phenomenal. There was no knowledge of the HIV infection or the nitrous oxide mechanism in physiology back in 1975. It must be puzzling to the biologists that the medical sector needs so many coding systems - SNOMED, ICPC, Read, ICD etc. The existence of multiple coding schemes is a hint that a sweeping simplifying solution is called for. UMLS has not lived up to its promise. With the changing pattern of health care delivery and a highly networked society, the advantages of a unitary system that can span across the various departments and specialities would be obvious. The statisticians can work at the genus level, the primary provider at the species level, and the specialists and research scientists can work lower down at the subspecies or subsubspecies level. The resident medical officer or the staff nurse applies a human-readable, character-based primary Docle code from a pick list in the patient electronic record. That piece of data capture meets the requirements for:

the day to day patient recording

medical decision support and

administrative purposes.

Docle objects are linked together to form a viable and congruous belief system. As an example of the congruity of the system, Docle has thrown up a previously unnamed body organ. Docle is fussy with its use of the dot operator and maps all body organs. It detected a gap in its anatomical hierarchy. The anatomical locations scrotum and testis has a missing superclass, Docle has christened this organ the tistum. The docle for tistum is tist which has as its subclasses scrotum and testis. In one sense, Docle is the first medical coding system with balls.

The coding wars may be over, even before the first shots are fired. While government and quasi-governmental bodies trash out the merits or otherwise of the various medical coding systems, the goal posts regarding the ideal medical coding system have, along with the computing juggernaut, inexorably moved forward. The internet standard called Common Graphical Interface (CGI) was accepted as an international standard, but overnight it became dated as Java and ActiveX were launched. Number -based medical coding systems will probably go the way of CGI due to the relentless pursuit of quality and utility. And why not?

The alphabetic Docle medical coding and classification system that is used by over two thousand doctors in Australia is not merely a coding system such as the ICD codes. Instead, the basis of Docle is a belief system modelled on the Linnean biological classification system. Medical entities are classified as species and placed in a hierarchy much like a species such as Homo sapiens. Every one of the currently ten thousand Docle medical objects are thus related in a congruous framework. For instance the liver object at the level called ORDER knows all its associated diseases, symptoms and signs.

The true tension in medical informatics is not a coding battle but it is a war of the electronic and intellectual representations of the current medical belief system. The Docle system has drawn the two strands of biology and medicine together with the common Linnean model. The fundamental issue in medical coding and classification today is the same as the biological classification conundrum in the days of Carolus Linnaeus in the 1750s; the solution is to define the concept of the medical species, identify the said species and give them names. Then stick with the names until they are proven to be inaccurate. This will impart order in the medical informatics domain and may well save years of hack work for the medical software implementor.

© Dr Y. K. Oon 1995-97


Ahmed T. and Silagy C. A. Evidence-based Medicine, Medical Journal of Australia, Volume 163, No. 2, pp 60-61,1995.
Book of Abstracts, Asia Pacific Association Medical Informatics Conference, pp 103, Singapore,1994.
Britt H. Which Code? Which Classification?, Informatics in Healthcare Australia, Volume 5, No. 4, pp 140- 144, 1996.
Britt H., Beaton N. and Miller G. Coding and classification in computerised general practice medical records: Why code? Why classify?, Australian Family Physician, Volume 24, pp 612-615, 1995.
Cochrane A. L. A critical review, with particular reference to the medical profession, in Medicines in the year 2000, Office of Health Economics, London, 1979.
Goldberg A. and Robson D. Smalltalk - 80. The Language and it’s Implementation, Addison Wesley, 1983.
ICHPPC-2-Defined: International Classification of Health Problems in Primary Care, Oxford University Press, 1983.
International Classification of Primary Care, H. Lamberts and M. Woods [Eds], Oxford University Press, 1987.
Oon Y. K. and Carson N. The Porta Language - a Portable Medical Record System, The Australian Computer Journal, Volume 17, No. 2, May 1985.
Oon Y. K. The Docle90 Medical Notation, Structured Language Resources, 1990.
Oon Y. K. Overcoming medical notation blues - the Docle90 method. Informatics In Healthcare Australia, Volume 2, No. 2, May 1993.
Oon Y. K. EOMR - Event Oriented Medical Record, Docle Systems, Melbourne,1994.
Oon Y. K. EOMR - The Event Oriented Medical Record, Informatics in Healthcare Australia, Volume 4, No. 2, pp 73-78, 1995.
Oon Y. K. Docle95 Classification, Docle Systems, Melbourne, 1995.
Oon Y. K. The Linnean Model of Medical Classification, in B. McGuinness and T. Leeder [Eds] HIC96 Proceedings of the fourth National Health Informatics Conference, pp 153, Melbourne, 1996.
Read Clinical Classification. Read code release set, CAMS, Loughborough UK, 1994.
Rose A. Our Records are Inadequate or are We on the Wrong Track? RACGP fifth Computer Conference, 1987.
Silagy C. The Cochrane Collaboration: Informing health care decision-making with evidence, Informatics in Healthcare Australia, Volume 5, No. 1, pp 4-7, 1996.
Smalltalk/V Object Oriented Programming Systems, Digitalk Incorporation, 1988-1994.
Stuart-Buttle C. The Read Codes: Towards a common language of health, Informatics in Healthcare Australia, Volume 2, No. 3, pp 21-28, 1993.
Weed L. L. Managing Medicine, in J. S. Wakefield [Ed] Medical Communications and Services Association, Kirkland, 1983.
Weed L. L. Medical Records, Medical Education and Patient Care, Case Reserve Press, Cleveland, 1969.


Author’s details:

Dr Y. Kuang Oon
Project Director
Docle Systems
29 Darryl Street
Scoresby, Victoria 3179, Australia

Phone: 61 3 97638935
Fax: 61 3 97649788
Email: docle@compuserve.com
http:// www.docle.com.au

Plum Medical Spreadsheet®, Plum® and Docle® are registered or pending trademarks.
© Docle Systems 1997

Go to DOCLE Home Page