DISCOWER: a discipline-oriented construction-based corpus of written English as a lingua franca

Statistics

The statistics presented below are based on technical and attribute-related data available in the core and expanded spreadsheets (see Corpus).

Statistics: technical data

Statistics in this section are related to data documenting the recognition and selection process of our data.

Authors

 

Over a half of these articles are by authors affiliated in Poland, but 75 other countries are represented too.
While most author affiliations are European, there are representatives of five continents among the authors.
Articles in linguistics (Polish-only vs other affiliations)
Articles in law (Polish-only vs other affiliations)
Articles in literary studies (Polish-only vs other affiliations)

 

Journals

 

In linguistics, there are 819 abstracts from 32 journals.
In law, there are 987 abstracts from 32 journals.
In literary studies, there are 330 abstracts from 21 journals.

 

Disciplines

 

The corpus contains abstracts from 2136 articles in three disciplines: linguistics, law, and literary studies.

 

Dates

 

The corpus covers articles from 2018-2021, but 2021 is represented to the least extent.

 

Conventions

 

The abstracts mostly have the conventional "abstract" label.
The abstracts are mostly located above the articles.

 

Statistics: attribute-related data

Basic and elaborated abstract attributes

 

Elaborated abstracts: attributes by discipline.
Basic abstracts: attributes by discipline.

 

Attribute co-occurrence in basic and elaborated abstract

 

The most common combination of attributes in elaborated abstracts differs between disciplines.
  1. In law, the most common (20%) elaborated abstracts are non-distinct, self-contained, non-closed, non-simple, synchronous, non-continuous, non-compact, homogeneous, neutral. Over 50% of the elaborated abstracts is law represent 4 top attribute combinations (describing 20%, 13%, 11% and 8% of the cases).
  2. In linguistics, the most common (29%) elaborated abstracts are non-distinct, self-contained, non-closed, non-simple, synchronous, continuous, non-compact, homogeneous, neutral. Over 50% of the elaborated abstracts in linguistics represent 3 top attribute combinations (describing 29%, 16%, and 14% of the cases).
  3. In literary studies, the most common (37%) elaborated abstracts are non-distinct, self-contained, non-closed, non-simple, synchronous, continuous, non-compact, homogeneous, neutral. This is the same as in linguistics, but in literary studies, over 50% of the elaborated abstracts represent just 2 top attribute combinations (describing 37% and 14% of the cases).
The most common combination of attributes in basic abstracts is the same in all disciplines.
  1. In all disciplines, the most common basic abstracts are non-distinct, self-contained, non-closed, non-simple, synchronous, continuous, non-compact, homogeneous, non-neutral.
  2. In law, these make up 24% of the corpus.
  3. In linguistics, they make up 29% of the corpus.
  4. In literary studies, they make up 45% of the corpus.
Elaborated abstracts: number of positive attribute values by discipline.
Basic abstracts: number of positive attribute values by discipline.

 

Basic and elaborated abstract compositions

 

In a vast majority of cases, elaborated abstracts contain more than the basic abstracts.
In a vast majority of cases, basic abstracts contain more than the abstract texts.
Elaborated abstracts range from 1 to 10 text type-based parts.
Basic abstracts range from 1 to 7 text type-based parts.

 

Paralinguistic objects in basic and elaborated abstract compositions

 

At part borders in elaborated abstracts there is usually extra white space.
Basic abstracts are more diverse in this respect.
Most elaborated abstracts do not feature paralinguistic objects at part borders.
Basic abstracts are more diverse in this respect.

 

Linguistic constructions in basic and elaborated abstract compositions

 

The most common linguistic construction appearing in elaborated abstracts is keywords.
The most common linguistic construction appearing in basic abstracts is title.

 

Sequences of linguistic constructions in basic and elaborated abstract compositions

 

Three sequences of linguistic constructions represent over 50% of the elaborated abstracts in the corpus.
  1. The abstract, keywords sequence is represented in 804 cases and makes up 38% of the corpus.
  2. The abstract sequence is represented in 195 cases and makes up 9% of the corpus.
  3. The author, title, abstract, keywords sequence is represented in 141 cases and makes up 7% of the corpus.
In law, three top sequences of linguistic constructions represent over 50% of elaborated abstracts.
  1. The abstract, keywords sequence is represented in 296 cases and makes up 30% of the law corpus.
  2. The author, title, abstract, keywords sequence is represented in 108 cases and makes up 11% of the l corpus.
  3. The abstract sequence is represented in 99 cases and makes up 10% of the law corpus.
In linguistics, four top sequences of linguistic constructions represent over 50% of elaborated abstracts.
  1. The abstract, keywords sequence is represented in 331 cases and makes up 40% of the liguistics corpus.
  2. The location, date, abstract, keywords sequence is represented in 113 cases and makes up 14% of the linguistics corpus.
  3. The keywords, abstract sequence is represented in 69 cases and makes up 8% of the linguitsics corpus.
  4. The abstract sequence is represented in 65 cases and makes up 8% of the linguistics corpus.
In literary studies, just one top sequence of linguistic constructions represents over 50% of elaborated abstracts.
  1. The abstract, keywords structure is represented in 187 cases and makes up 57% of the literary studies corpus.

 

For more information on Statistics, contact D. Guttfeld (contact details in the footer).

DISCOWER: a discipline-oriented construction-based corpus of written English as a lingua franca