NLP Notes: #4 (from RTE 1) Summary

Wednesday, July 19, 2006

#4 (from RTE 1) Summary

2 farkli sistem kullanarak yarismaya katildilar:

System 1:

- The system processes both H and T using:

MITRE-built tokenizer
sentence segmenter
the Ratnaparkhi (1996) POS tagger
the University of Sussex's Morph morphological analyzer
the CMU Link Grammar parser
a MITRE-built dependency analyzer
Davidsonian logic generator

- T and H are compared using:

the Uniersity of Rochester's EPILOG event-oriented probabilistic inference engine

Score:
- Total pairs processed: 800
- Corrected labeled T: 11/285 and F:292/302 accuracy:0.52 cws:0.5

System 2:

The system is inspired by Statistical machine translation models.
- Machine Translation (MT) evaluation
- some string matching algorithms (Gusfield, 1997)
- MT alignment score

ORNEK: (GIZA++ alignment for a training pair)

T: Floods caused by Monday's torrential rains surrounded two villages in the southern part of
.......|.............................................................. |........... |...... |.......|........... |................
H: Floods............................................................ engulf .. two villages in southern..............

T: Bushehr province today ...
........|................................
H: Iran.............................

(blogspot'ta arka arkaya bosluk koyulamiyor, o yuzden nokta kullandim)

Development set yetersiz oldugu icin Gigaword newswire corpus (Graff, 2003) 'dan yararlanilmis.

Hipotezlerine gore: Bir haberin ilk paragrafindan(lead) haberin basligi(headline) entail edilebiliyor.

Hipotezlerini test etmek icin:
- 1000 tane lead-headline manually judge ediliyor.
- Buna gore Gigaword'un %60 'i icin bu hipotezin dogru oldugu tahmin ediliyor.
- (SVMlight: Joachims, 2002) kullanilarak bu data refine ediliyor. "Like those classifiers used to predict genre or topic, this training included the entire articles with bag-of-words features."
- Finally, they derive a 100,000-document subset of Gigaword with approximately 75% lead-entails-headline purity.

Fakat training data neredeyse ise yaramiyor. Cunku RTE 'deki negative instance'larda T ve H arasinda substantial conceptual overlap varken, negative Gigaword pair'larinda overlap cok az var.

Paper RTE dataset'i elestiriyor. Kendi human judger'lari ile RTE annotator'lari arasinda entailment'larda hemfikirlilik orani 91%.

Sunu farketmisler ki: TRUE entailment'larin 94%'u sadece basit paraphrase'ler:
ORNEK:
John murdered Bill -> Bill was killed by John. (as opposed to classic entailments (Bill is dead).

"During this process, we uncovered many ceases where we disagreed with the given truth value on the grounds of synonymy (in bloody clothes -> covered in blood)"

"We also identified potential disagreements
about the extent to which world knowledge is allowed
to play a role. For instance, pair 102 (domestic
threat  threat of attack) is more
convincing if one understands the implications of
al Qaeda and September 11, 2001 mentioned in the
text."
(Dataset'ler download edilemiyor, dataset'leri host eden arkadasa email attim, Mail Delivery System'den failure geliyor.)

Score:
- Total pairs processed: 800
- Corrected Labeled T: 231/400 F:238/400 accuracy: 0.59 precision:0.59 cws:0.62

The paper: http://www.cs.biu.ac.il/~glikmao/rte05/bayer_et_al.pdf

# posted by Bengi Mizrahi @ 9:08 PM

Comments: Post a Comment

<< Home

NLP Notes

Wednesday, July 19, 2006

#4 (from RTE 1) Summary

Contributors

Links

archives