top of page
Search

How Can AI Help Read Old Newspapers?

  • Apr 5
  • 2 min read

You are no doubt familiar with the problem—you come across an interesting article in a historical newspaper, but it’s not only printed in old type and orthography, it’s also written in historical language with archaic expressions. It’s not easy but AI can help.


robot magician transforms old newspapers

This post was inspired by a question in a Facebook group, where my suggestion to use AI for this problem was met with skepticism. Therefore, I decided to demonstrate that AI (artificial intelligence) can indeed be a helpful assistant.


The website periodika.lv automatically provides electronic versions of the printed texts of old newspapers, but often this version isn’t much easier to read—the words are still in old orthography, and OCR errors are common. Below is an example.


historical newspaper
Screenshot from periodika.lv

But I've learned to use AI to make this process easier. I'll walk you through it step by step:


1. 1. I copy the electronic text.

In this example, I used a speech delivered at the opening of the beekeeping society “Drava” on August 9, 1898. The transcribed text is full of errors—for instance, “w” is most often read by computer as “m.” If the entire text had to be corrected manually, it would require a great deal of time and patience. Moreover, it is not easy to immediately “get the hang of it” and correctly read, for example, “ee” as “ie,” or to know when “s” should be understood as “z,” and so on.


2. I use the paid version of ChatGPT (GPT-4).

Other AI tools could certainly help as well, but at the moment I’m working with this one. I ask it to transcribe the text into modern orthography without changing the words or sentence structure.


Of course, one could also ask AI to rewrite the text in modern language—allowing changes to words and sentence structure—but then it becomes harder to spot mistakes or compare with the original.


My prompt for AI in this case was: "Here is a transcript of a historical newspaper text. The computer recognized the letters, but a large part of them inaccurately. Write the text in modern orthography, but do not change the structure of the words or sentences." In my example I worked in Latvian but you can test with an English prompt, too but you have to point out that your


Entered text ChatGPT
The Input text in ChatGPT

3. I check the result.

Once I have the first version, I compare it with the original. AI does make mistakes. At this stage, it’s not really possible for an English speaker to verify the transformed text, but you can assume that most of it will be accurate


ChatGPT generated text
ChatGPT Output Text


4. You can then proceed as needed.

Once the text has been reviewed and corrected, I can ask AI to rewrite it in modern language, translate it, or summarize it—depending on what’s needed in the specific case.


For English speakers, I wouldn’t skip step 3 anyway, because AI might not translate the text correctly at first. However, once it has been converted into modern orthography, the translation is usually much more accurate

Tags:

 
 
 

Kommentare


bottom of page