Language Matters: Wikipedia in Hindi, Urdu, and English during an evolving regional conflict

 

Motivation

  • How do editing practices differ between languages?

  • How quickly do editors respond to significant events of relevance to an article?

  • To what degree Wikipedia articles in three languages — Hindi, Urdu, and English — achieve Wikipedia’s mission of making neutrally-presented (NPOV), reliable information on a polarizing, controversial topic available to people around the globe.

  • We chose the topic of the recent revocation of Article 370 of the Constitution of India, which, along with other recent events in and concerning the region of Jammu and Kashmir, has drawn attention to related articles on Wikipedia.

  • How do Wikipedia practices of editors in the three languages vary in general, and how do they uphold the NPOV principle? We approached this question from the following perspectives.

  • How do the Wikipedia communities and editors’ practices vary across the Hindi, Urdu, and English Wikipedias?

  • To what extent has a neutral point of view (NPOV) been sought on a topic that invites edits with an agenda?

  • Wikipedia defines itself as a multilingual online encyclopedia, created and maintained as an open collaboration project using a wiki-based editing system.

  • One of its core principles is to maintain consensus around a neutral point of view in Wikipedia articles [29, 43].

  • When information is not shared between different language editions of Wikipedia, it prevents access to a larger variety of content for monolingual users [29].

  • Information seekers who are in proximity to a conflict, and who only access media coverage of said conflict in their local languages, may get an incomplete or biased picture of events [28].

  • Adversarial actors have been known to exploit Wikipedia pages and to manipulate search engine results topresent a biased, false, or purposefully misleading version of events, thereby disseminating their propaganda to the broader internet [24, 26].

kashmir.png
 

Hindi and Urdu are the two standardized varieties of Hindustani, a lingua franca of Jammu and Kashmir (J&K). J&K is a union territory administered and claimed by India. Pakistan and India each only control a part of the former princely state, but both claim J&K in its entirety. Article 370 of the Constitution of India was passed in 1954 and made temporary provisions for J&K that partly exempted the territory from the Constitution. On August 5, 2019, Article 370 was revoked, which drew attention to the articles in our corpus.

MethodsOur focus is on comparing Wikipedia editors’ behavior, motivations, and reactions to significant events related to the polarizing conflict in J&K between the Hindi, Urdu and English Wikipedias.

Page selections:

(A1) Article 370 of the Constitution of India

(A2) Pulwama attack

(A3) Insurgency in Jammu and Kashmir

(A4) Kashmir conflict

(A5) Jammu and Kashmir Reorganization Act, 2019

We gathered data on revisions and page views for each article via the MediaWiki API2. For page view data, we used the Python library pageviewapi3, and for reversions, we relied on the mwreverts library4.

3https://pypi.org/project/pageviewapi/ 4https://github.com/mediawiki-utilities/python-mwreverts

Quantitative

We analyzed page view and revision data for three Wikipedia articles to gauge popularity of the pages in our corpus, and responsiveness of editors to breaking news events and problematic edits.

Additionally, we interviewed editors from all three Wikipedias to learn about differences in editing processes and motivations, and we compared the text of the articles across languages, as they appeared shortly after the revocation of Article 370.

To these ends, we conducted time series analyses of page view and revision data for each article. In addition, we interviewed editors to learn how their writing practices vary from what we know from previous works, and to learn how they maintain a neutral point of view (NPOV) on a topic that is prone to bias.

The page views on Article 370 (A1) in English, Hindi, and Urdu peaked at 2.53 million, 380,000, and a mere 382 views, respectively, on the day Article 370 was revoked. Given that Urdu and Hindi speakers are frequently also proficient or fluent in English, they may contribute

For nearly every article in our corpus (A1-5), the length of the articles (measured in bytes) was the greatest in the English edition, followed by the Hindi and then the Urdu editions

This does not mirror the difference in the total number of articles in each edition (a measure of a Wikipedia’s maturity): 139,246 in Hindi, 154,057 in Urdu, and 6,087,693 in English, as of this writing (May 29, 2020). The Urdu edition yielded 10% more articles than the Hindi edition. The Hindi and Urdu Wikipedias were created within a year of each other, in July 2003 and January 2004, respectively, so the difference in volume is not owed to one having had more time to develop. By contrast, there were 28 unique contributors, excluding the anonymous IP edits, to the Urdu articles in our corpus and 137 contributors to the Hindi articles (A1–A5)

Article talk pages are meant for discussing controversial edits. They ebb and flow in size as issues are raised and resolved; that is, if editors use them. There were several articles in our corpus (A4-UR, A2-UR, A5-HI, A5-UR) where the talk page was empty for the entirety of 2019, either because the page was never initiated (no one had ever used it in the history of the article) or because all pre-2019 issues had been removed and no new issues were brought up in 2019. Those Hindi and Urdu talk pages in our corpus that did have content in 2019 had only one contributor each, with one exception, A1-HI, which had six contributors.

RESPONSIVENESS

Editors Work Across Articles, but Rarely Across Languages

Structural Similarity

Point of View in Non-English Articles

Discrepancies in Information

UNDERSTANDING WIKIPEDIA PRACTICES

Motivation: Hobby of Being a Good Global (and Local) Citizen

Process: How They “Present Only the Facts”

Collaboration: Off-site Communication for Non-English Editors

NPOV: Dedicated to Neutrality

While activity on South Asian language editions of Wikipedia is growing, at the time of writing, the Hindi and Urdu editions are still in their nascency. In Hindi and Urdu, as well as English, editors predominantly try to adhere to the principle of neutral point of view (NPOV), and for the most part, the editors quash attempts by other editors to push political agendas.

We found many ways in which Hindi and Urdu editors differed from English editors in the way they collaborate, as well as one key similarity: editors in all three languages were devoted to presenting neutral accounts of the conflict.

DISCUSSION

Given Enough Editors, All Issues Are Noticed

Good intentions. We see that the Hindi and Urdu editions have small but mighty armies of editors who are devoted to growing their respective editions and presenting information as neutrally as possible—but to a certain extent, the communities are too small to be neutral, as we

Data voids. There is a danger with polarizing topics, like J&K, of adversarial actors planting false or misleading information, especially when news has just broken and reliable sources may not have had time to react. In these times, adversarial actors exploit the “data void,” the absence of high-quality information about the new event [26]. Perhaps the one case we observed where a whole article was blatantly POV (A3-Hi) is an example of a data void being exploited—Insurgency in Jammu and Kashmir does not appear to be

The Arc of the Hindi and Urdu Editions Bends Toward Resilience

Our findings are a testament to the power of Wikipedia to influence public opinion. It has earned its reputation as a reliable, encyclopedic source—and the go-to place for information on all branches of knowledge, not least breaking news and polarizing topics such as Black Lives Matter [42] and the status of J&K. The exact NPOV characteristic which makes Wikipedia a reliable asset also makes it a target for adversarial editors. The well-meaning regular editors have to deal with vandals who introduce POV/biased information.

Despite efforts by adversarial actors, the dedicated editors from all three language editions in this study all strove to uphold neutral point of view (NPOV). All the editors we interviewed shared a passion for Wikipedia and considered themselves stewards of neutral information on this unique, peer production platform—making accurate information available, keeping articles current, and in the case of the Hindi and Urdu editors, growing their language edition (section 6). Despite editors’ best efforts, NPOV was hard to achieve for the Hindi and Urdu editions, by virtue of the editing communities being so small (section 7.1). The response times to new