© Universität Bielefeld

Center for Uncertainty Studies Blog

Center for Uncertainty Studies Blog - Kategorie Digital Academy

Digital Academy 2023: Exploring Uncertainty in Toponyms within the British Colonial Corpus

Veröffentlicht am 2. Mai 2024

From September 25 to 28, 2023, the Digital History Working Group at Bielefeld University welcomed participants to the Digital Academy, themed "From Uncertainty to Action: Advancing Research with Digital Data." This event delved into the complexities of data-based research, exploring strategies to navigate uncertainties within the Digital Humanities. In a series of blog posts, four attendees of the workshop program share insights into their work on data collections and analysis and reflect on the knowledge gained from the interdisciplinary discussions at the Digital Academy. Learn more about the event visiting the Digital Academy Website.

Exploring Uncertainty in Toponyms within the British Colonial Corpus

by  Shanmugapriya T

My research project aims to extract toponyms from the British India colonial corpus to create a historical gazetteer. The primary challenge in this work revolves around the toponyms themselves, as they exhibit a high degree of fuzziness and inconsistency, particularly in their spellings. Historically, mapping, documenting, and surveying have been recognized as essential tools employed by colonial powers to demarcate, expand, and exert control over their colonial subjects. These activities enabled the colonial administration to establish governance over land and streamline revenue collection during the British colonial period. As time progressed, surveys expanded beyond their initial military and geographical purposes, evolving into comprehensive sources of information encompassing geography, political economy, and natural history. The British colonial India corpus is, therefore, intricate, marked by non-standard formatting, and plagued by inconsistencies in the spelling of Indian toponyms. This intricacy adds an extra layer of complexity to the task of extracting and organizing these toponyms for the creation of a historical gazetteer. The recognition of these challenges underscores the importance of using advanced techniques and tools to handle the uncertainty inherent in this historical data.

Digital Humanities methods and tools

Dealing with fuzzy toponyms requires the application of specific and advanced techniques. In this context, I utilize digital humanities methods and tools to identify and extract these toponyms from the British India colonial corpus. Indian toponyms in the British colonial corpus often exhibit various spellings, such as "Noil", "Noyal", "Noyyal", "Bawani", "Bhawani" and "Bowani," representing different variations of river and place names in the Southern region of India. To address this challenge, I conducted an exploration of the corpus. My approach involved leveraging an English word database, employing regular expressions, using natural language processing module Spacy for customized entities, and utilizing other relevant Python libraries to extract transliterated words from the corpus. Additionally, I developed a user interface using programming languages HTML, CSS and JavaScript. I used an open access database MySQL to store the data and PHP for interactive and management of the data. Finally, I employed Geographic Information System (GIS) tool ArcGIS to filter, map, and tag the toponyms and other entities within the dataset. While these initial experiments contributed to theoretical considerations and raised awareness of the complexities inherent in studying the British colonial corpus, the employed method did not entirely resolve the challenge of extracting toponyms. It also inadvertently filtered out misspelled and non-contemporary English words, along with the targeted toponyms. 
The new method I propose involves three distinct stages. The first stage centers on the identification of entities using advanced natural language processing module BERT Named Entity Recognition (NER) (Devlin et al. 2018) to create a trained dataset on place names. This NER system is instrumental in locating hidden toponyms and learning from contextual information. The second stage is dedicated to the extraction of fuzzy toponyms, for which I employ advanced natural language processing module DeezyMatch (Hosseini et al. 2020). DeezyMatch is specifically designed for fuzzy string matching and toponym extraction. To generate the training dataset for string pairs, I also collect alternate names of places in South India. By learning similar transformations as those present in the training set, DeezyMatch should be capable of applying this knowledge to unseen variations of toponyms. Subsequently, I use the cleaned dataset to determine optimal hyperparameters for specific scenarios, such as finding the ideal thresholds for matching. In the final stage, I create a database for the historical gazetteer and integrate it with the World Historical Gazetteer. This integration is significant as it offers a wide range of content and services that empower global historians, their students, and the general public to engage in spatial and temporal analysis and visualization within a data-rich environment, spanning global and trans-regional scales (“Introducing the World Historical Gazetteer”). This enhances the accessibility and utility of the historical toponym data for a broad audience.
Main challenges

The first and foremost challenge is the absence of a trained dataset of Indian place names. I need to focus on creating a trained dataset using Named Entity Recognition and other external open-access resources, such as Wikipedia. The second challenge pertains to the advanced programming techniques that I am experimenting with. The initial experiment with BERT NER for identifying toponym entities demonstrates that the algorithm performs well compared to other NER libraries. However, it also identified a few words that are not toponyms as place names and did not identify the broken toponym words as place names. Therefore, the extracted place name entities will require manual verification to confirm their accuracy. I anticipate encountering additional challenges when I begin exploring DeezyMatch, as I am currently in the initial stages of my research.

Digital Academy workshop on uncertainty 

The Digital Academy workshop presented a fantastic opportunity for scholars like myself to convene and discuss a wide array of challenges, approaches, methods, and tools for addressing uncertainty. The inclusion of experts in the field of uncertainty was a valuable aspect of this workshop, enabling attendees to solicit advice and feedback on the challenges they face in their research. Although I was not able to attend the entire workshop, the workshop's theme serves as a motivating factor for me to persist in my research endeavors despite the numerous challenges I've encountered. I believe that ongoing discussions and collaboration within the academic community will be instrumental in finding effective solutions to these challenges and further advancing the field. 

Questions remain open

The open questions revolve around the ideal size of the corpus required for applying the aforementioned advanced techniques and the expected effectiveness of the trained dataset. However, I am hopeful that I will be able to find answers to these questions in the near future. 

References

World Historical Gazetteer. “Introducing the World Historical Gazetteer.” Accessed October 10, 2023. https://whgazetteer.org/about/.

Devlin, Jacob, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” North American Chapter of the Association for Computational Linguistics (2019). Accessed October 5, 2023. https://arxiv.org/pdf/1810.04805v2.

Hosseini, Kasra, Federico Nanni, and Mariona Coll Ardanuy. “DeezyMatch: A Flexible Deep Learning Approach to Fuzzy String Matching.” Paper presented at the Empirical Methods in Natural Language Processing: System Demonstrations, Online, October 2020. https://aclanthology.org/2020.emnlp-demos.9. Accessed October 5, 2023.  

Biographical note

Shanmugapriya T is an Assistant Professor in the Department of Humanities and Social Sciences at the Indian Institute of Technology (Indian School of Mines) Dhanbad. She was a Digital Humanities Postdoctoral Scholar in the Department of Historical and Cultural Studies (HCS) at the University of Toronto Scarborough. Her expertise centers around the development and application of digital humanities methods and tools for historical and literary research in South Asia, particularly within the realms of colonial and postcolonial studies. She has a specific focus on areas such as text mining, digital mapping, and the creation of digital creative visualizations.
Visit the personal website: https://www.shanmugapriya.com/

Gesendet von AStrothotte in Digital Academy

Digital Academy 2023: Catrina Langenegger about Swiss Military Refugee Camps

Veröffentlicht am 5. April 2024

From September 25 to 28, 2023, the Digital History Working Group at Bielefeld University welcomed participants to the Digital Academy, themed "From Uncertainty to Action: Advancing Research with Digital Data." This event delved into the complexities of data-based research, exploring strategies to navigate uncertainties within the Digital Humanities. In a series of blog posts, four attendees of the workshop program share insights into their work on data collections and analysis and reflect on the knowledge gained from the interdisciplinary discussions at the Digital Academy. Learn more about the event visiting the Digital Academy Website.

 

 Historical Map of Switzerland.

Swiss military refugee camps

by Catrina Langenegger 

In my research project I examine the Swiss policy of asylum and the military camps for refugees during the Second World War. In this blog post, I thereby focus on the data I collected on these refugee camps and the questions of uncertainty within my work with the data. I encountered uncertainty primarily in the areas of incomplete data, the standardisation process and different data qualities. I will first give a short introduction to my research topic and will then discuss the sources and data I collected. I will thereafter focus on my work with the data, the challenges I encountered when dealing with uncertainty and the benefits I took away from the Digital Academy.
Refugee aid is a civil task. As I focus on military support, I consequently deal with a temporary, exceptional phenomenon. In Switzerland, first the private refugee aid organisations and then the department of police were responsible for the refugees. From 1940 onwards the department of police opened camps to home the refugees and emigrants who sought protection in Switzerland. In the late summer of 1942 the number of refugees was constantly rising. More and more, the civil administration was overstrained. It could neither provide enough space for housing, nor enough financial support, food and staff. Briefly said, the system of civil refugee camps was in danger to collapse. In this situation, the military was asked to stand in. The army was considered to be the only institution that could acquire enough buildings, recruit enough personal and provide a sufficient system for replenishment. 
In September 1942 the first reception camp lead by the military was established in Geneva. The army took over the first care for the refugees with food, clothing and accommodation. From that point of time, a new system of three different camps lead by the military was established, that every refugee hat to go through, before being placed constantly in a refugee camp under civil administration. Collecting camps where placed next to the boarder. Due to concerns for hygiene, the refugees were obliged to spend three weeks in a quarantine camp. After the quarantine, the refugees could theoretically move to civil camps but most of the refugees had to stay in reception camps because there was no space for them under the civil administration. Some of the refugees had to stay only for a few days or weeks, others spent months in reception camps. These military refugee camps are the topic of my research. They operated until after the end of the war.

Serial sources as data

Besides the administrative sources like commands and instructions, protocols of inspections and meetings, and weekly reports from the camps are stored in the Swiss federal archives. These serial sources are the basis of my data analysis. I found them in eleven different archive collections. I extracted the information out of the reports into a database. All in all, I found reports on 168 weeks, from October 1942 to July 1946. Nevertheless, the thereby combined collection contains voids. For at least eleven weeks no reports were to be found. It is at least eleven because the first report dates on the 18th of October 1942. However, first camps were opened in September 1942. I am not aware of earlier reports as I could not find any. But it is also possible that the standardised reporting started only in the middle of October. The voids are one aspect of uncertainty I will focus on in this blog post. I aim at being transparent about the gaps and make them visible at all stages of processing.
During the process of data cleaning, I decided to work only with data that refers to one or more refugee in a camp. Data with no refugees or camps that were emptied and only on reserve are therefore not included in the dataset. All in all, I have a dataset with more than 6’000 observations on refugees in the camps. These observations do not only show how many refugees were housed, but also which type of refugees (civilian, military) they were and which type of camp (quarantine, collection, or reception camp) it was. Reflecting on these categories is part of my data critique and leads as well into the field of uncertainty.
The next step was data cleaning and standardisation. I corrected obvious typing errors in the process of data extraction to reduce the number of variables. Then I standardised the camp names. As a subject librarian, dealing with data and meta-data as well as standardising it is part of my daily task. Here are some examples for standardisation with changing names: the camp name “Grand Verger” refers to the same camp as “Signal”. Similarly, the names “Geisshubel” and “Rothrist” refer to the same camp. I put a lot of effort into the standardisation. In the end I found 221 camps. Since one aim of my research project is to depict and analyse the refugee camp system over time, it was important to have a data set as clean and reliable as possible as a basis for the analysis. The various standardisation steps were important for data quality, as the quality of the entire analysis depends on it.

Handling data and uncertainty

To take a step further and to focus on questions about living in the camps during the analysis, I enriched my data with information about the building type and the exact georeference. My approach to deal with the uncertainty I encountered when collecting geodata for every camp to analyse and visualize it in a geographic information system (GIS) to show the geographical distribution, was triangulation by other source types. Sources that contained the necessary information were reports, protocols, autobiographies etc. I also used historical maps provided by swisstopo1, to localise the camps. In many cases the information was good: “factory building 500 metres outside the village” or “Hotel up on a hill between this village and the other”. I could then add the exact geodata. For other camps, the information was not as precise as I had hoped for, and I had only the name of the village. In other cases – most of them were hospitals, prisons, or camps that were only open for a short time. But the location was always within the borders of the territorial district. So I made a sound decision for these camps. For one entity without any information, not even the district, I decided to not georeference it at all. 
As I am working as a librarian, I am used to the convention of coding the quality of the metadata. In a library catalogue you can check the level of cataloguing, whether the book was catalogued by a librarian or a machine for example. Having varying qualities of data in my set, I aimed at qualifying it. I therefore went for three different categories: A B and C to make a statement on the accuracy of my data. If someone wants to use my data later, the uncertainty is made transparent through this code. A stands for the best quality, i.e. information about the address at the level of the building. B stands for medium quality; the information is correct at the village or town level. C stands for the most uncertain category, the information is provided within the territorial district and is based on variant indications. 

I now come back to the missing reports mentioned above. My goal is to be transparent about this gap. However, making this gap visible in statistics and visualisations is one of the greatest challenges when dealing with uncertainty. Statistics and visualisations are positivistic: they only show what is there. In the first statistics, the gaps weren’t visible. I therefore made artificial observations in my dataset with a zero as value to mark the gaps. In other words, I made the missing weekly reports visible by creating an observation for each of these dates. I have labelled these artificial observations as such. My data model now provides a field to mark whether there is a report for the week or not. Nevertheless, it’s almost impossible to visualise the weeks without information. Although I have made artificial entries in my dataset, these are not displayed in the visualizations because they do not contain a value.

 

fig. 1: Timeline with missing data

fig. 2: Auto-corrected timeline

The software I use calculates out all uncertain data and provides the average. I found a way to work around this by only using the edit mode, even for my visualisations because in the viewing mode, the observations inserted by me to show the uncertainty will be removed. In both examples, I was able to incorporate the uncertainty into the data via a categorisation in my data model. In this way, I also hope that my data can be better reused, as it makes transparent statements about its own quality.

The workshop of the Digital Academy 2023 gave me the impetus to take a closer look at the subject of insecurity. The opportunity to exchange ideas with other researchers was very enriching. I was also able to present how I deal with uncertainty and develop an even clearer definition of my categories and my approach based on the discussions and comments in the workshop.

Biographical note

Catrina Langenegger recently submitted her PhD thesis on refugee camps under military control in Switzerland during the Second World War. She conducts her research at the Centre for Jewish Studies at the University of Basel. As a historian with a focus on digital humanities she exercises her passion for data also in her role as subject librarian with a background in library and information sciences.

References:

1. Cf. Karten der Schweiz - Schweizerische Eidgenossenschaft - map.geo.admin.chhttps://map.geo.admin.ch/?topic=swisstopo&lang=de&bgLayer=ch.swisstopo.pixelkarte-farbe&catalogNodes=1392&layers=ch.swisstopo.zeitreihen&time=1864&layers_timestamp=18641231. 

Gesendet von AStrothotte in Digital Academy

Meet ... Jens Zinn

Veröffentlicht am 6. Dezember 2023

 

Jens Zinn is Tr Ashworth Associate Professor in Sociology Social and Political Sciences at The University of Melbourne and CeUS Member. 

What connects you to Bielefeld University? 

I am connected to Bielefeld University personally and professionally. After my undergraduate studies in Saabrücken I was attracted by the large and only Faculty of Sociology in Europe at Bielefeld University which offered a large variety of approaches taught by outstanding sociology scholars. It was also a formative experience since I learned about concepts such as ‘time’ and ‘risk’ which became influential in sociological debate (Beck 1986, 1988; Luhmann 1985, 1991; Douglas & Wildavsky 1982). Amongst the many scholars in particular the analytical sharpness of Niklas Luhmann and Franz Xaver Kaufmann but also the historical work of Reinhard Koselleck influenced my work and approach to risk and uncertainty as analytical concepts as well as discourse semantic changes. 
I am therefore still connected to the scholarship in systems theory and the Institute for World Society Studies as well as Historical Semantics and the corpus/computational analysis of social change (compare SFB 1288).
Indeed, having lived and worked at Bielefeld University I am also emotionally attached to the central university building (I consider as “the starship”) providing everything what is needed to focus on research. As a research assistant, from my office I had a good view on the Centre of Interdisciplinary Research (ZiF), which was already in the early days an indication for the innovative interdisciplinary research culture at Bielefeld University. With the University being placed close to the “Teuto” (Teutoburger Wald) I still enjoy walking through the woods whenever I find the time when visiting.
 
What role does Uncertainty play in your research?
 
Uncertainty is a key concept in my research. When I initiated two research networks on risk studies within the Europeans Sociological Association (2005) and the International Sociological Association (2006) I was keen to find the key concepts which could hold together the complex scholarship on risk studies and would characterise a broader sociological rather than a psychological, economical or technological approach to the future. At this point the Sociology of Risk and Uncertainty (SoRU) was born to see risk in the context of uncertainty, and uncertainty in the context of its social relevance when something of value is at stake (this includes possible harm as well as gains but the recognition of the relevance of the unknown for the presence). In this way risky uncertainty characterises decision making situations I am interested in. These contrast with people following worn-out paths of routines without further consideration. 
 
It is here where my recent work on everyday life engagement with risky uncertainty connects with uncertainty studies. In the social realm the modernisation process contributed to a significant shift in the ways how uncertainty is understood and managed. A key element has been the development of calculative technologies and most recently the advancement in computer technology and advancing social digitisation. At the same time ‘hope’, ‘faith’ and ‘ideology’ remain powerful resources of social enchantment which seem also necessary for managing risky uncertainty. The comparatively abstract forms of reasoning related to social rationalisation and enchantment are not sufficient to understand people’s engagement with risky uncertainty in everyday life. Here the subjectivation of detached forms of knowledge are required to understand lifeworld forms of reasoning represented by concepts such as ‘trust’, ‘intuition’ and ‘emotions’ (Schulz & Zinn 2023). 
 
What would you like to accomplish in a Center for Uncertainty Studies?
 
The Centre of Uncertainty studies is an exciting hub which opens opportunities for interdisciplinary collaboration and innovative conceptual advancement. There are three areas of research I expect to advance in the Centre for Uncertainty Studies. 
 
(1)   In many ways the different disciplines involved in CeUS represent different understandings of uncertainty which are influential in public debates. I am interested in the imaginaries and research practices through which my colleagues construct uncertainty as a research object as well as a research reality to be managed. On this basis I would like to further develop an outline of the sociology of uncertainty and risk, which helps to specify and understand how social forces combine or amalgamate in the social navigation of uncertainty.
 
(2)   In a more concrete conceptual enterprise, I want to further develop a phenomenology of uncertainty and risk, which is capable of making sense of the processes of the individual and institutional engagement with risky uncertainties. This would follow developments in social science disciplines which not only study modes of engagement with uncertainty such as trust, intuition, emotions, and hope but how such modes inform the research process itself.
 
(3)   The broad methodological expertise within CeUS allows developing digital resources and methods to analyse the societal understanding and responses to risky uncertainties such as pandemics, climate change related new social challenges (e.g., heat waves and other weather events). I would like to advance the collaboration between different disciplines such as linguistics, sociology, history and digital humanities more broadly and mathematically trained modelers to develop powerful research instruments (conceptually and technically) to better understand historical developments as well as the meaning and effects of increasing societal digitisation.
 
To what extent is interdisciplinarity important in your work? 
 
The social management of risk and uncertainty relies on interdisciplinary and transdisciplinary approaches. These support collaborative learning which is crucial for producing good and socially acceptable outcomes. In this context my sociological approach to uncertainty also profits from connecting to other research such as in psychology, media studies, health studies, history, linguistics, and philosophy. Insights from risk perception studies as well as decision making research has informed my studies as well as linguistic research instruments for the analysis of discourse semantic changes of risk. Conceptual insights from philosophy are informing my theoretical work on a phenomenology of risk and uncertainty as well as empirical insights from environmental sociology, science and technology studies, disaster research, media studies, health studies and youth studies and many more. Thus, being strongly rooted in sociology my research and theorizing is informed and connects to other disciplines to show its relevance across disciplines as much as getting inspired by related work from different disciplinary perspectives. 
 
The first CeUS conference ("Navigating Uncertainty: Preparing Society for the Future") took place in Bielefeld at the beginning of June - which moments were particularly exciting for you? What do you take away?
 
The conference became quite exciting when I realised to what extent my own conceptual work on ‘rational’, ‘non-rational’ and ‘in-between’ modes of engaging with risk and uncertainty can connect to the empirical work presented by many of the participants considering trust, emotions, hope and other ways of engaging with uncertainty.
Admittedly, I was not able to connect to every contribution in the same way. However, I was surprised and thrilled by the large number of disciplines I could connect to such as conflict studies and historical studies. 
 
To sum it up: Do you have specific strategies in your personal or professional life to deal with uncertainty?
 
My approach as well as my professional interest relate to what Greek philosophers might have positioned in the realm of “phronesis”. This seems to me a decent way to engage with risky uncertainties which cannot be easily mastered by the application of technique or differentiated knowledge systems but requires practical wisdom which considers ethical and normative standards as well as different forms of (non-)knowledge in research, professional decision making and the life world.

Gesendet von AStrothotte in Digital Academy

Kategorie Hinweis

Auf dieser Seite werden nur die der Kategorie Digital Academy zugeordneten Blogeinträge gezeigt.

Wenn Sie alle Blogeinträge sehen möchten klicken Sie auf: Startseite

Kalender

« Mai 2024
MoDiMiDoFrSaSo
  
1
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Heute