Tuple Construction

From linguistic annotations to (C,E,I) tuples

Overview

Tuple construction transforms qualitative linguistic annotations (indicators, markers) into quantitative (C, E, I) values through a systematic three-step process:

  1. Entity identification: Extract C and E based on syntactic projection
  2. Polarity determination: Calculate the sign of I from indicator class and negation
  3. Salience calculation: Calculate the magnitude |I| from morphological and syntactic markers

The output is a fully specified triple (C, E, I) where I \in [-1, +1] represents:

I = \pm(\text{polarity}) \times |\text{salience}|

Step 1: Entity Identification

Input: Annotated causal relation with indicator and syntactic dependencies
Output: Entity pair (C, E)
Method: Syntactic projection according to indicator-specific patterns

Causal indicators project their arguments through predictable syntactic patterns. The most frequent patterns:

Transitive-Causative Verbs

Indicators: cause, trigger, produce, stop, prevent

Projection: Subject → Cause, Direct object → Effect

NoteExample

Pesticides cause insect mortality.

  • Indicator: cause (transitive-causative)
  • Subject (pesticides) = Cause
  • Direct object (insect mortality) = Effect
  • Result: (C=\text{pesticides}, E=\text{insect mortality})

Copula Constructions

Indicators: cause (noun), consequence, reason

Projection: Subject → Cause, Prepositional object (for/of) → Effect

NoteExample

Climate change is the cause of species extinction.

  • Indicator: cause (copula construction)
  • Subject (climate change) = Cause
  • Prepositional object (of species extinction) = EFFECT
  • Result: (C=\text{climate change}, E=\text{species extinction})

Prepositional Markers

Indicators: due to, because of, through

Projection: Prepositional object → Cause, Matrix clause subject/object → Effect

NoteExample

Species die out due to habitat loss.

  • Indicator: due to (prepositional)
  • Prepositional object (habitat loss) = Cause
  • Matrix subject (species) combined with verb (die out) = Effect
  • Result: (C=\text{habitat loss}, E=\text{species die out})

Entity Minimization

Extracted entities follow the token minimization principle: attributive modifiers are extracted as separate coefficients, leaving only head tokens as entities.

TipWhy minimization?

Minimal entities enable better aggregation. Instead of treating “industrial pesticides” and “agricultural pesticides” as separate causes, we extract:

  • Entity: pesticides
  • Coefficient: industrial / agricultural

This allows aggregating evidence about pesticides as a general cause while preserving modifier information for detailed analysis.

Step 2: Polarity Determination

Input: Entity pair (C, E) and annotated indicator with optional negation markers
Output: Sign of I (+ or −)
Method: Base polarity from indicator class, modified by negation

Base Polarity from Indicator Class

Each indicator family has an inherent polarity:

Promoting indicators (I_{\text{default}} > 0): - Verbs: cause, trigger, lead to, produce, strengthen - Nouns: cause, reason, consequence - Prepositions: due to, because of, through

Inhibiting indicators (I_{\text{default}} < 0): - Verbs: stop, prevent, reduce, block, curb - Nouns: prevention, barrier, protection against - Prepositions: against, despite

NoteExample

Measures stop insect mortality.

  • Indicator: stop ∈ STOP family (inhibiting)
  • Base polarity: I_{\text{default}} < 0

Negation Modification

Contextual negation markers modify base polarity through two mechanisms:

Object-Based Negation

Negative nominals (loss, decline, absence) invert polarity with odd numbers of negations:

\begin{align*} \text{1 negation:} \quad &I_{\text{final}} = -I_{\text{default}} \\ \text{2 negations:} \quad &I_{\text{final}} = I_{\text{default}} \\ \text{3 negations:} \quad &I_{\text{final}} = -I_{\text{default}} \end{align*}

NoteExample: Single negation

Loss of habitats causes bee mortality.

  • Indicator: causesI_{\text{default}} > 0 (promoting)
  • Object negation on Cause (loss): 1×
  • Polarity inverted: I_{\text{final}} < 0 (inhibiting)
  • Interpretation: Less habitat leads to more bee mortality (inhibiting relation)
NoteExample: Double negation

Loss of pesticides prevents loss of bees.

  • Indicator: preventsI_{\text{default}} < 0 (inhibiting)
  • Object negations: loss Cause + loss (Effect) = 2×
  • Polarity preserved: I_{\text{final}} < 0 (inhibiting)
  • Interpretation: Less pesticides leads to fewer bee deaths (inhibiting relation)

Propositional Negation

Propositional negation (not cause, doesn’t prevent) neutralizes the relation:

NoteExample

Pesticides do not cause bee mortality.

  • Indicator: causeI_{\text{default}} > 1
  • Verbal negation: not
  • Influence neutralized: I_{\text{final}} = 0
WarningComplex Negation

The framework currently doesn’t differentiate between neutralized positive (e.g. not causing) and neutralized negative (e.g. not preventing) relationships, as both result in 0.

Step 3: Salience Calculation

Input: Entity pair (C, E) with annotated markers
Output: Magnitude |I| \in [0,1]
Method: Combine explicit markers and structural distribution

Salience emerges from two factors:

Explicit Lexical Markers

Markers directly specify relative weight:

Monocausal (|I| = 1.0): - Determination: the cause (not a cause) - Exclusivity: responsible for, the reason - No competing causes mentioned

Prioritized (|I| = 0.75): - Emphasis: mainly, primarily, above all - Composition: main cause, key factor

Distributed (|I| = 0.5): - Contribution: contributes to, plays a role - Composition: partial cause, one factor - Distribution: among other things, also

Structural Distribution

Multiple coordinated causes distribute salience proportionally:

Construction Each cause gets
X causes Z (alone) \|I\| = 1.0
X and Y cause Z \|I\| = 0.5
A, B, and C cause Z \|I\| = 0.33
NoteExample: Explicit + Structural

X and Y are two main causes of Z.

  1. Explicit marker: main causes → base salience = 0.75
  2. Structural distribution: 2 causes → divide by 2
  3. Final salience: |I| = 0.75 \div 2 = 0.375 per cause

(In practice, we round to the nearest conventional value: 0.5)

Default Assumption

If no markers and no competing causes: Assume monocausal attribution (|I| = 1.0)

This reflects the discourse convention that unmarked causal statements present causes as primary factors unless explicitly qualified.

Integration: Computing Final I

Combining all components:

I = \text{sign}(\text{polarity after negation}) \times |\text{salience}|

Complete Example

“Mainly pesticides and habitat loss contribute to bee mortality.”

Step 1: Entities - Indicator: contribute to (transitive) - C_1 = \text{pesticides}, C_2 = \text{habitat loss} - E = \text{bee mortality}

Step 2: Polarity - Indicator contributeI_{\text{default}} > 0 (promoting) - No negation markers - Final polarity: +

Step 3: Salience - Explicit marker: mainly → base = 0.75 - Structural: 2 causes → divide by 2 - Salience per cause: |I| = 0.75 \div 2 \approx 0.5

Result: - (C=\text{pesticides}, E=\text{bee mortality}, I=+0.5) - (C=\text{habitat loss}, E=\text{bee mortality}, I=+0.5)

Next Steps

Once tuples are constructed, they can be:

  1. Aggregated across multiple texts to build evidence for causal relations
  2. Integrated into ACGs for graph-based discourse analysis
  3. Analyzed for discourse patterns, temporal dynamics, and argumentation structures

The systematic transformation from qualitative annotations to quantitative tuples bridges interpretive linguistics and computational analysis, enabling scalable yet semantically rich causal extraction.