Project "Syntactically analyzed and
disambiguated text corpus"
financed by Ministry of Education and
Research of Estonia program "Estonian language and national
culture"
Duration 2002 - 2003
- The aim
of the project is to extend the testing corpus for the Constraint
Grammar shallow syntactic parser, written by K. Müürisep and
K. Puolakainen, to gain better precision and recall of the parser. The
corpus consists of fiction (107 000 words1) +
newspaper texts (10 000 words) + legal texts (6 000 words). In 2003, we
are about to annotate 70 000 words of fiction. By the end of the year
2003, the size of the test corpus will be 200 000 words.
- Project
leader: Tiit Roosmaa,
PhD
Kaili Müürisep, PhD (computer
science)
Kadri Muischnek, M.A. (general linguistics)
Heli Uibo, M.Sc. (computer science)
Heili Orav, M.A. (general linguistics)
Andriela Rääbis, M.A. (Estonian language)
- "junior" (students
of computational linguistics):
Siiri Pärkson
Helen Nigol
Kadri Kerner
Birge Talve
- Useful
links for syntactic annotators:
1 - state of January
2003
The page is maintained by Heli Uibo (heli_u@ut.ee). Last modified Feb 27,
2004.