DISCOWER: a discipline-oriented construction-based corpus of written English as a lingua franca

Attributes

While the DISCOWER project recognizes three abstract strata, i.e. the abstract text, the basic abstract and the elaborated abstract in two modes, i.e. linguistic and paralinguistic (see Foundations for details), it concentrates on the paralinguistic layer of basic and elaborated abstracts, i.e. two text types conceptualized as constructions whose meanings are described through (values of) attributes rooted in prägnant properties.

The system of attributes used in the project is developed through a conceptual-empirical loop of cognitive linguistics, i.e. an alternation of conceptual analyses and empirical testing. This implies that the theoretical framework presented in Foundations is used to guide our general decisions, such as employing prägnant properties or distinguishing between their levels of granularity, while our precise decisions on how to operationalize these are made only after extensive work with the material, and numerous discussions aimed at answering two questions, i.e. what are Prägnanz-based attributes (in basic and elaborated abstracts)? and how do Prägnanz-based attributes manifest themselves in basic and elaborated abstracts?

The answers allow us to distinguish nine attributes, i.e. distinctness, self-containment, closure,   simplicity, synchrony, continuity, compactness, homogeneity, and neutrality, each with two values, i.e. positive and negative. Positive values represent properties of a better Gestalt, while negative values show the opposite. For instance, with reference to the distinctness attribute, an abstract can be evaluated as distinct or non-distinct by being assigned a “yes” or “no” value on this attribute. An abstract scoring “yes” values on all its attributes (i.e. evaluated as distinct, self-contained, closed, simple, synchronous, continuous, compact, homogeneous and neutral) aligns most closely with Prägnanz (see Foundations for details).

As signalled above, the choice of attributes results from balancing theory and data with a view to generating ideas for further research. This means two things. Firstly, some prägnant attributes, e.g. boundedness, are deliberately excluded from the list since they are defaultly present in our data with only one value, i.e. the corpus includes only abstracts which are bounded (fully visible on one PDF page) (see Priciples for an illustration). Secondly, some prägnant attributes, e.g. specificity, are not taken into account since they seem to capitalize on the other attributes rather than provide unique specifications.

With reference to names of attributes, given the proliferation of overlapping terms and definitions used to describe often closely related prägnant properties (e.g. simplicity vs. singularity, naturalness vs. neutrality), we choose to prioritize those used by cognitive linguists inspired by Gestalt theory, e.g. simplicity and neutrality.

Below the nine (values of) attributes developed in this project are juxtaposed with their concise Prägnanz-based definitions.

Attribute Value Definition
DISTINCTNESS distinct/non-distinct An abstract is not of the same nature as its surroundings.
SELF-CONTAINMENT self-contained/non-self-contained An abstract does not involve an extrinsic region.
CLOSURE closed/non-closed An abstract reaches its potential extremes.
SIMPLICITY simple/non-simple An abstract has one unit.
SYNCHRONY synchronous/non-synchronous An abstract’s units go in one direction.
CONTINUITY continuous/non-continuous An abstract’s units are aligned.
COMPACTNESS compact/non-compact An abstract’s units are not interrupted.
HOMOGENEITY homogeneous/non-homogeneous An abstract’s units are of the same quality.
NEUTRALITY neutral/non-neutral An abstract’s units are not emphasized.

 Table 1. Attributes

As indicated by the definitions above, attributes range from (more) extrinsic, i.e. related to the abstract’s context, to (more) intrinsic, i.e. related to the abstract itself (see Foundations for details). Finer distinctions along the extrinsic-intrinsic continuum can be introduced through sets of attributes. 

Sets of attributes

The nine attributes can be roughly grouped into three sets corresponding to their progressively more intrinsic nature, as evidenced by their physical manifestations. In other words, we can group attributes into those placed in the context of an abstract, located on the abstract’s contour and situated within the abstract’s contents. The division of attributes into sets is presented below.

Set Attribute
Context DISTINCTNESS
Contour SELF-CONTAINMENT
Contour CLOSURE
Contents SIMPLICITY
Contents SYNCHRONY
Contents CONTINUITY
Contents COMPACTNESS
Contents HOMOGENEITY
Contents NEUTRALITY

 Table 2. Sets of attributes

Forms of attributes

The division into sets is not merely theoretical. Once operationalized, it also supports a step-by-step approach to our analysis, enabling a gradual assessment of (values of) attributes in consonance with the specific nature of the material under analysis, i.e. written abstracts of research articles, and the focus adopted, i.e. the paralinguistic layer of basic and elaborated abstracts. In other words, operationalizations of context, contour and contents let us establish links between forms and meanings of basic and elaborated abstracts. Since the categories of context, contour and contents come at different levels of granularity, as if zooming in on the abstract, their successful operationalizations involve progressively more specific categories, as discussed below.

 

Context

In consonance with our layered approach, the basic abstract is set in the context of its elaborated abstract which is set within its vertically scanned PDF page (see Procedure for details). While in each case context is operationalized as the white space (and, optionally, linguistic and paralinguistic objects) above and below an abstract, it actually involves two manifestations: ideal (with the white space alone) and non-ideal (with linguistic or paralinguistic objects), as illustrated below.

None

 

Contour

Contour is operationalized as a superimposition of an ideal (i.e. maximally prägnant) shape and the real shape of an abstract. The ideal shape for what in our data is typically a horizontal block of text is a rectangle containing all the linguistic and paralinguistic objects in an abstract (and the white space between them).

An abstract’s real shape is defined by the contrast between white space and an abstract’s extreme (leftmost, rightmost, topmost and bottommost) points. These points can be roughly connected along the lines of the ideal shape, e.g. between aligned letters or between aligned lines, which means that, along some of its sections, the ideal shape coincides with the real shape.

While in most cases the real shape is intrinsic, i.e. formed by the text itself, an abstract’s shape can also be extrinsic, i.e. formed by paralinguistic objects alone.

Despite the intrinsic/extrinsic distinction, which implies a kind of “a double contour” (see Figures 9 and 11 above), each abstract has in fact only one outermost real shape, i.e. one contour. Still, its different quality means that some abstracts are self-contained (with the text region alone), while others are non-self-contained (with the text region and the paralinguistic region) (see Figures 8-11 above).

Qualitative differences can also be spotted in abstracts’ corners, two of which, i.e. upper left and bottom right, are of relevance here due to general cultural constraints (see Foundations for details). Possible variants of basic and elaborated abstracts’ corners, both ideal, or closed, i.e. filled with linguistic or paralinguistic objects, and non-ideal, or non-closed, i.e. filled with white space, are provided below.

Contents

Contents are operationalized as text type-based parts (see Constructions for details), obligatorily including linguistic objects, e.g., title, author, keywords, (for elaborated abstracts) and title, author, abstract label, abstract text (for basic abstracts) and, optionally, paralinguistic objects, e.g. rules (for elaborated abstracts) and rules, colons and dots (for basic abstracts). While parts are identified primarily through the linguistic mode (i.e. seen and read), relations between parts are described solely through the paralinguistic mode (i.e. seen).

Part-based relations involve various levels of granularity, i.e. low (in establishing synchrony and compactness), medium (in detecting continuity), and high (in establishing homogeneity and neutrality). Expectedly, then, forms which capture these relations can be found at borders, along sides or within interiors, respectively.

Since the category of contents yields continua along as many as six dimensions, examples are provided for extreme cases alone, while further details about parts and relations between them (particularly those at the lowest level of granularity, i.e. synchrony and compactness) can be found in Composition (see Constructions).

 

Form-attribute pairings

Although for expository reasons attributes, or rather attribute-based meanings, and their forms are discussed separately above, they are in fact closely, or even iconically, related (see Foundations for details). In other words, attributes and their indicators, presented in Table 3 below, are form-meaning pairings, i.e. paralinguistic and linguistic constructions (see Constructions for details).

Attribute-based meaning Form (elaborated)

Form (basic)

DISTINCTNESS It has no objects (linguistic or paralinguistic) above and below within the PDF page. It has no objects (linguistic or paralinguistic) above and below within the elaborated abstract.
SELF-CONTAINMENT It has no paralinguistic contour, e.g. a box or a rule. It has no paralinguistic contour, e.g. a box or a rule.
CLOSURE It has linguistic or paralinguistic objects in upper-left and lower-right corners. It has linguistic or paralinguistic objects in upper-left and lower-right corners.
SIMPLICITY It has one text type-based part. It has one text type-based part.
SYNCHRONY All its parts scan vertically or horizontally. All its parts scan vertically or horizontally.
CONTINUITY Beginnings of all its parts are in one vertical or horizontal line. Beginnings of all its parts are in one vertical or horizontal line.
COMPACTNESS It has no extra white space or paralinguistic objects, e.g. rules, at borders between parts. It has no extra white space or paralinguistic objects, e.g. rules, at borders between parts.
HOMOGENEITY All its parts are similar in, e.g., font size and background/font colour. All its parts are similar in, e.g., font size and background/font colour.
NEUTRALITY None of its parts use, e.g., all-caps, bold, italic or spaced-out font. None of its parts use, e.g., all-caps, bold, italic or spaced-out font.

Table 3. Form-attribute pairings

Application of form-attribute pairings

Click here to see typical examples of abstracts evaluated as having positive and negative values of particular attributes. For easier reference, the extent of the (basic and elaborated) abstract is indicated wherever relevant. 

For more information on Attributes, contact A. Strugielska, D. Watkowska or D. Guttfeld (see contact details in the footer).

DISCOWER: a discipline-oriented construction-based corpus of written English as a lingua franca