English/Digitizing Estonian Literature

Allikas: Wikimedia Eesti
Mine navigeerimisribale Mine otsikasti

Digitizing Estonian Literature is a project of Wikimedia Eesti.

The objective of the project is to digitize copyright-free Estonian literature in Estonian Wikisource (Vikitekstid).

This is a short overview of current situation, as of March 3, 2012.

Current phase[muuda]

In the start of 2012, Wikimedia Eesti bought a laptop and a scanner for this project. At the moment, this equipment is used by the only Wikipedian (Kasutaja:Avjoska) working on digitization, the single result being the text of Eduard Bornhöhe's "Villu võitlused" ("Villu's struggle").

Most of our efforts are still directed to planning and cooperation with Estonian memory institutions that have already started digitization due to their own projects. Currently, Kasutaja:Oop and Kasutaja:Kruusamägi are working on that.

Our partners[muuda]

The main digitizers, and therefore, hopefully, our main partners are:

We have discussed our plans with the first two, and further discussions are in order.

General situation[muuda]

As said, all these institutions have their own digitization projects. Thence, they use different data formats and create their own databases, which are often more closed than open, and do not show on the best positions in internet search. One has to know where to look, sometimes registering or creating an account. A large part of digitized material has not even got OCR. Yet for the end user, things are simple: if I can't find a text on Google, it is of no use to me.

In comparison, Wikisource is an indefeatable channel, it has all we need. Every text in Wikisources climbs to the top of Internet search, it is Googleable and can be found by (tentative) citations, it can be downloaded, reformatted or printed according to everyone's needs. And if we continue the wikipedization of Estonian literature, we can add links to Wikisource on many articles, improving every text's chances to meet a happy reader.


This is going to be a long project and has to be planned carefully.

We cannot publish texts covered with copyright in Wikisources. This means we have to concentrate on the older texts, written by authors who have died before 1940. Estonian literature is relatively young and there are not many of those, yet this corpus should suffice for quite some time, and it is constantly growing.

It seems most useful to start with the texts that are mandatory in the school curriculum, as there is a high and steady demand for these. People look for these texts all the time, therefore they will create a high traffic to Estonian Wikisource. Later, we can expand to the less popular texts of the classics, and other authors.

In the long run, however, we can hope to persuade living authors or the relatives of deceased authors to release some texts under suitable CC licenses. We can also imagine asking several foundations for grants to pay the translators for producing CC-licensed translations of Estonian authors to be published on Wikisource. This could have a significant impact on the spread of Estonian literature in other languages. But that's another story.

So, we are trying to collect information on each digitizer's work, their needs and policies that could create a basis for our cooperation, mapping all necessary resources and possible financing (it could be useful to motivate volunteers a bit, there could also be expenses on equipment and rooms).

The hard part is finding and organizing volunteers for proofreading the whole text corpus, as Estonian Wikipedia is constantly struggling with the lack of people - we are a small nation, and the ratio of active Wikipedians per the number of speakers of the language is already among the best of all language versions of Wikipedia. But we think we can still improve this situation.

Pikem ülevaade eesti keeles.