Tuple Construction
From linguistic annotations to (C,E,I) tuples
Overview
Tuple construction transforms qualitative linguistic annotations (indicators, markers) into quantitative (C, E, I) values through a systematic three-step process:
- Entity identification: Extract C and E based on syntactic projection
- Polarity determination: Calculate the sign of I from indicator class and negation
- Salience calculation: Calculate the magnitude |I| from morphological and syntactic markers
The output is a fully specified triple (C, E, I) where I \in [-1, +1] represents:
I = \pm(\text{polarity}) \times |\text{salience}|
Step 1: Entity Identification
Input: Annotated causal relation with indicator and syntactic dependencies
Output: Entity pair (C, E)
Method: Syntactic projection according to indicator-specific patterns
Causal indicators project their arguments through predictable syntactic patterns. The most frequent patterns:
Transitive-Causative Verbs
Indicators: cause, trigger, produce, stop, prevent
Projection: Subject → Cause, Direct object → Effect
Pesticides cause insect mortality.
- Indicator: cause (transitive-causative)
- Subject (pesticides) = Cause
- Direct object (insect mortality) = Effect
- Result: (C=\text{pesticides}, E=\text{insect mortality})
Copula Constructions
Indicators: cause (noun), consequence, reason
Projection: Subject → Cause, Prepositional object (for/of) → Effect
Climate change is the cause of species extinction.
- Indicator: cause (copula construction)
- Subject (climate change) = Cause
- Prepositional object (of species extinction) = EFFECT
- Result: (C=\text{climate change}, E=\text{species extinction})
Prepositional Markers
Indicators: due to, because of, through
Projection: Prepositional object → Cause, Matrix clause subject/object → Effect
Species die out due to habitat loss.
- Indicator: due to (prepositional)
- Prepositional object (habitat loss) = Cause
- Matrix subject (species) combined with verb (die out) = Effect
- Result: (C=\text{habitat loss}, E=\text{species die out})
Entity Minimization
Extracted entities follow the token minimization principle: attributive modifiers are extracted as separate coefficients, leaving only head tokens as entities.
Minimal entities enable better aggregation. Instead of treating “industrial pesticides” and “agricultural pesticides” as separate causes, we extract:
- Entity: pesticides
- Coefficient: industrial / agricultural
This allows aggregating evidence about pesticides as a general cause while preserving modifier information for detailed analysis.
Step 2: Polarity Determination
Input: Entity pair (C, E) and annotated indicator with optional negation markers
Output: Sign of I (+ or −)
Method: Base polarity from indicator class, modified by negation
Base Polarity from Indicator Class
Each indicator family has an inherent polarity:
Promoting indicators (I_{\text{default}} > 0): - Verbs: cause, trigger, lead to, produce, strengthen - Nouns: cause, reason, consequence - Prepositions: due to, because of, through
Inhibiting indicators (I_{\text{default}} < 0): - Verbs: stop, prevent, reduce, block, curb - Nouns: prevention, barrier, protection against - Prepositions: against, despite
Measures stop insect mortality.
- Indicator: stop ∈ STOP family (inhibiting)
- Base polarity: I_{\text{default}} < 0
Negation Modification
Contextual negation markers modify base polarity through two mechanisms:
Object-Based Negation
Negative nominals (loss, decline, absence) invert polarity with odd numbers of negations:
\begin{align*} \text{1 negation:} \quad &I_{\text{final}} = -I_{\text{default}} \\ \text{2 negations:} \quad &I_{\text{final}} = I_{\text{default}} \\ \text{3 negations:} \quad &I_{\text{final}} = -I_{\text{default}} \end{align*}
Loss of habitats causes bee mortality.
- Indicator: causes → I_{\text{default}} > 0 (promoting)
- Object negation on Cause (loss): 1×
- Polarity inverted: I_{\text{final}} < 0 (inhibiting)
- Interpretation: Less habitat leads to more bee mortality (inhibiting relation)
Loss of pesticides prevents loss of bees.
- Indicator: prevents → I_{\text{default}} < 0 (inhibiting)
- Object negations: loss Cause + loss (Effect) = 2×
- Polarity preserved: I_{\text{final}} < 0 (inhibiting)
- Interpretation: Less pesticides leads to fewer bee deaths (inhibiting relation)
Propositional Negation
Propositional negation (not cause, doesn’t prevent) neutralizes the relation:
Pesticides do not cause bee mortality.
- Indicator: cause → I_{\text{default}} > 1
- Verbal negation: not
- Influence neutralized: I_{\text{final}} = 0
The framework currently doesn’t differentiate between neutralized positive (e.g. not causing) and neutralized negative (e.g. not preventing) relationships, as both result in 0.
Step 3: Salience Calculation
Input: Entity pair (C, E) with annotated markers
Output: Magnitude |I| \in [0,1]
Method: Combine explicit markers and structural distribution
Salience emerges from two factors:
Explicit Lexical Markers
Markers directly specify relative weight:
Monocausal (|I| = 1.0): - Determination: the cause (not a cause) - Exclusivity: responsible for, the reason - No competing causes mentioned
Prioritized (|I| = 0.75): - Emphasis: mainly, primarily, above all - Composition: main cause, key factor
Distributed (|I| = 0.5): - Contribution: contributes to, plays a role - Composition: partial cause, one factor - Distribution: among other things, also
Structural Distribution
Multiple coordinated causes distribute salience proportionally:
| Construction | Each cause gets |
|---|---|
| X causes Z (alone) | \|I\| = 1.0 |
| X and Y cause Z | \|I\| = 0.5 |
| A, B, and C cause Z | \|I\| = 0.33 |
X and Y are two main causes of Z.
- Explicit marker: main causes → base salience = 0.75
- Structural distribution: 2 causes → divide by 2
- Final salience: |I| = 0.75 \div 2 = 0.375 per cause
(In practice, we round to the nearest conventional value: 0.5)
Default Assumption
If no markers and no competing causes: Assume monocausal attribution (|I| = 1.0)
This reflects the discourse convention that unmarked causal statements present causes as primary factors unless explicitly qualified.
Integration: Computing Final I
Combining all components:
I = \text{sign}(\text{polarity after negation}) \times |\text{salience}|
Complete Example
“Mainly pesticides and habitat loss contribute to bee mortality.”
Step 1: Entities - Indicator: contribute to (transitive) - C_1 = \text{pesticides}, C_2 = \text{habitat loss} - E = \text{bee mortality}
Step 2: Polarity - Indicator contribute → I_{\text{default}} > 0 (promoting) - No negation markers - Final polarity: +
Step 3: Salience - Explicit marker: mainly → base = 0.75 - Structural: 2 causes → divide by 2 - Salience per cause: |I| = 0.75 \div 2 \approx 0.5
Result: - (C=\text{pesticides}, E=\text{bee mortality}, I=+0.5) - (C=\text{habitat loss}, E=\text{bee mortality}, I=+0.5)
Next Steps
Once tuples are constructed, they can be:
- Aggregated across multiple texts to build evidence for causal relations
- Integrated into ACGs for graph-based discourse analysis
- Analyzed for discourse patterns, temporal dynamics, and argumentation structures
The systematic transformation from qualitative annotations to quantitative tuples bridges interpretive linguistics and computational analysis, enabling scalable yet semantically rich causal extraction.