Languages Wiki
Linguasphere Observatory
The Linguasphere Observatory.png
Historical information

Quebec, 1983


David Dalby

Distribution information

Wales, UK

Political information

The Linguasphere Observatory (French: Observatoire Linguistique; Welsh: Wylfa Ieithoedd) is an international network devoted to the study of world languages.


Léopold Sédar Senghor, the honorary president of the observatory.

The Linguasphere Observatory was created in 1983 in Quebec by David Dalby, and was eventually established and registered in France as a non-profit association, under honorary leadership of Léopold Sédar Senghor. The research center is currently located in Wales, United Kingdom. It has created an innovative scheme of philological classification, including a singular referential framework combining both the genetic and geographic categories of similarity, of which are termed as phylozones and geozones.

Volume 2 of the Linguasphere Register of the World's Languages and Speech Communities.

In 1999 and 2000, the observatory published the 2-volume Linguasphere Register of the World's Languages and Speech Communities (LSR1).[1] From 2001 to the end of 2005, the Linguasphere Observatory was involved in the partnership with the British Standards Institution (BSI) and ISO TC37 in designing and developing an alpha-4 code (ISO 639-6), which covers, potentially, every recorded variety of language in the world.

Since 2006, the Linguasphere Observatory has mainly worked on preparation of the updated second edition of the Linguasphere Register (LSR2).

Language codes[]

The Linguasphere Observatory devised the Linguasphere language code, a reference system for world languages, and is published in the Observatory's Linguasphere Register. It is an expansive, flexible system that relates each dialect or language with another.

The first part of the code is a decimal classification made of two numbers from 00 to 99. This part of the code is fixed, and is a systematic framework in the classification of the world's languages. Although the method of classification used in this part of the code is very similar to other codings to linguists, unique terminology is used in the definitions in the Linguasphere Register. The first number of this code represents the sector in which the languages of the world are divided. The sector can either be classified as a phylosector, where its constituent languages are considered to be in a genetic relationship with each other, or a geosector, where the languages are grouped by their geographical location rather than their genetic relationships.

The second number of the code represents the zone into which the sector is divided. Like the sectors, the zones are described as phylozones or geozones, based on the relationship of languages, either genetically or geographically.

The second part of the Linguasphere code is made of three capital letters from AAA to ZZZ. Each of the zones is divided into at least one set, with each set being represented by the first letter of the second section. Each of the sets is divided into at least one chain, which is represented by the second letter, and each chain is divided into at least one net, which is represented by the third letter. The divisions of a language into sets, chains, and nets is based on analysis of linguistic similarities from statistics. Thus, a geozone is more often divided into more sets than a phylozone, because the genetic relationships between languages of the latter usually ensures a greater amount of similarity between its members.

The third, and last part of the Linguasphere code consists of 1–3 lowercase letters used to identify a specific language or dialect with accurate precision, from aaa to zzz. The first letter of this section represents the outer language. According to statistical analysis of linguistic similarity, the several language varieties that make up the outer language are put into code using a second, and sometimes even a third letter.


Wikipedia-logo.png This page or section incorporates Creative Commons Licensed content from Wikipedia (view authors).

Understanding the Linguasphere language code is often easier given a few examples.

For example, the code for English is 52-ABA, where the 5 represents the Indo-European languages, 52 represents the Germanic languages, 52-A represents the Norsk + Frysk set, 52-AB represents the English + Anglo-Creole chain, and 52-ABA represents the English net. Within this net, outer languages include:

  • 52-ABA-aScots + Northumbrian
  • 52-ABA-b – Anglo-English (traditional English spoken in southern England)
  • 52-ABA-cGlobal English (English spoken around the world)

Some more specific examples of English dialects are:

  • 52-ABA-abbGeordie; belongs to the 52-ABA-a Scots + Northumbrian outer language, and 52-ABA-ab Northumbrian.
  • 52-ABA-bcoNorfolk; belongs to the 52-ABA-b Anglo-English outer language, and 52-ABA-bc Southern Anglo-English.
  • 52-ABA-cofNigerian English; belongs to the 52-ABA-c Global English outer language, and 52-ABA-co West-African English.


  1. David Dalby, Linguasphere Register of the World's Languages and Speech Communities (2 vols.), Hebron, Wales, 1999-2000