The Challenge784 followers
How might we gather information from hard-to-access areas to prevent mass violence against civilians?
CrisisTracker: Real-time Social Media CurationCrisisTracker is a web platform that extracts situation awareness reports from public tweets during humanitarian disasters. It combines automated processing with crowdsourcing to quickly detect new events and bring together related evidence.
=== OPEN QUESTIONS ===
The proposed open-source system (see below) has been used in practice for conflict monitoring, but several research/design/development challenges remain. if you want to contribute, please post your thoughts below regarding the following questions.
I (Jakob) would also love to brainstorm about these challenges in a Skype call (jakob.rogstadius), in particular if you have decision making or analyst experience related to conflict monitoring or intervention.
What information actually leads to action in the domain of conflict monitoring and prevention?
Neither technology nor information automatically leads to action and there are several cases where atrocities during civil wars have been well known, but no action was taken to prevent them (e.g. Rwanda). I can also imagine cases where a summary of raw reports with limited context or explanation can actually trigger new violence. What information should a system like this provide to be meaningful and to lead to positive change?
Can open-access information management systems improve the safety of regular citizens or help them contribute to peace?
Conflict monitoring and atrocity prevention are traditionally approached from a top-down perspective. The role of information management systems targeted at expert analysts is well established,but can such system also be used to empower bottom-up efforts? Rather than discussing what information should be hidden from the public, is there any information that can help improve individual safety, or promote mindsets that lead to conflict reduction and long-term stability in conflict zones?
What decision making processes should be supported?
I am primarily a software engineer and I need to know more about the specific decisions that are made by decision makers in peacekeeping and conflict monitoring situations. What decisions need to be taken, when, and what information is required to make those decisions? This knowledge is extremely helpful to make design trade-offs and to prioritize different features in the system.
How can a crowd assist decision makers with meaningful analysis?
Evaluation of the system has shown that both volunteers who curate content and decision makers who wish to consume the information would prefer that volunteers work with more complex reasoning tasks, rather than just data annotation. What decisions are frequent and important enough that it would be efficient to offload the required data analysis to a skilled or semi-skilled crowd? How can volunteers sufficiently share their evidence, reasoning and conclusions with decision makers for their work to be trusted?
What quantitative indicators are needed?
Good quantitative indicators (things that can be measured numerically) are required to provide meaningful time series, and to rank content by 'importance'. However, when the raw data is social media content, it is very difficult to extract traditional quantitative indicators such as the number of affected unemployed women in rural areas. Other quantitative metrics such as the number of messages or the number of unique people discussing the event are readily available, but are far less meaningful. It's clear that some form of quantifier needs to be extracted, but what low-hanging fruit should we aim for to still provide helpful time series? A rough estimate of the number of people affected (1s, 10s, 100s, 1000s) per event? Number of new events of type X per day? Number of people discussing any event of type X per day?
What are the ethical implications of a system like this, in particular in conflict situations?
I believe sources are sufficiently protected, but do others agree with me? What if the system collects information that mostly benefits one side in the conflict? Are there any (new) risks that this tool introduces into decision making processes, or does this tool simply require the same skepticism as any other source?
=== CONCEPT DESCRIPTION - CRISIS TRACKER ===
During conflicts in recent years, online social media (mainly Twitter, Facebook and YouTube) has emerged as a means for conflict affected local populations to communicate their experiences to the world. With increasing technology adoption and free access to posted messages, online social media can now be used to leverage the reporting capacity of thousands or millions of people on the ground for large-scale real-time distributed sensing.
The Twitter microblogging service saw 500 million tweets being posted daily in October 2012, by over 200 million active users. Unlike for instance Facebook and SMS, the vast majority of these tweets is shared publicly and can be accessed in real-time though an application programming interface (API). The challenge however is sense-making. With so much content being generated, maintaining overview and history, and detecting patterns and actionable information, requires specialized information management tools.
CrisisTracker is an open-source online webplatform developed primarily by me during my PhD studies, which adds structureto millions of reports already available on Twitter. This additional layer ofstructure helps reduce information overload, making it much easier to use socialmedia as a rich source for real-time situational awareness.
CrisisTracker infers structure by makinguse of the repetition that occurs when multiple people independently reportimpactful events, in two ways. First, the greater the number of people thattalk about an event, the more likely that event is to be of interest to asystem user. This is not a perfect indicator,but with far more information being collected than what can be consumed, havingsuch a metric is critical. Second, the CrisisTracker platform uses an automatedreal-time clustering algorithm to group together tweets that are textually verysimilar. A cluster of messages (a “story”) typically refers to a singlewell-defined event, such as an attack on a protected object, artillery shellingof a location, a bombing, etc. Although individual tweets are both extremelybrief (up to 140 characters) and difficult to verify independently, stories inCrisisTracker capture the event from multiple viewpoints and provide areal-time index of published evidence in the form of images, video and newsarticles.
After reports have been clustered, theplatform uses crowdsourcing techniques to extract structured meta-data (type ofevent, geographic location and named entities) from the stories, which improvesthe quality of search and filtering in the system.
How could you begin prototyping this idea in a simple way to begin testing and refining it? Who would use your idea and/or who is using it now? Is your idea technically easy medium or hard to implement?
Due to limited availability of human curators, Syria Tracker has however only been using the fully automated features in the system. This is why learning classifiers and increased automation are such important extensions.
An incident commander who tested the Syria deployment of the system stated that “I feel very confidently that those reports will come out ahead of CNN and BBC and that they will have the central nuggets of who, what, when, where, why. For an incident commander, it is the difference between learning something in 2-3 hours versus learning it in 6-8.” Furthermore, a GIS expert said that “you can see over a period of time where people are moving, how that relates to conflict areas. Water shortage, or food, you can almost anticipate where needs are going to be based on what you are seeing.”
For more information about this evaluation, please see http://hci.uma.pt/~jakob/files/Rogstadius_2013_CrisisTracker_Crowdsourced_Social_Media_Curation_for_Disaster_Awareness.pdf
Try the system at http://ufn.virtues.fi/crisistracker/
How does your idea gather AND verify information? How does your idea keep those who use it safe?
CrisisTracker cannot automatically verify information. However, as the system clusters information into stories, it becomes possible to compare different tweets that talk about the same event. Disaster response experts who used the system have described how this makes it much easier to compare different versions of a story to make a more nuanced assessment of a situation, and to compare the available evidence for or against each claim.
The idea is not to replace existing conflict monitoring techniques with CrisisTracker, but rather to use the system as a complement. For instance, by providing cheap real-time country-wide monitoring of activities, the system can enable more accurate and earlier allocation of scarce resources such as trained observers and high-resolution satellites. The system also offers visibility into areas where no organizational presence can be maintained on the ground.
How is your idea adapted for conditions in hard-to-access areas, such as lack of internet and mobile access? Can users adopt it without much behavior change?
The technology behind CrisisTracker could likely be used with content from other online social networks or even SMS data, but I am not currently aware of any comparable source of openly available reports.
How might your idea be designed to scale and spread to help as many people as possible?
If a pool of 10-100 human curators can be maintained, each working around 30 minutes per day, then the platform allows search and filtering also by geographic location, event type and named entities.
Although the system is capable of directing human curators to work on the most important stories, curator availability is an issue during prolonged crisis such as conflicts. I am therefore currently working on extending the platform with supervised machine learning algorithms (using the AIDR tool that I've helped develop at QCRI), so that the system can generalize the human curation behavior into event-specific rules. Such rules can then be applied instantly for each newly detected story to classify information at much greater scale. I also hope to integrate existing algorithms for automatic location extraction, to further reduce the dependence on human curators.
Once topic extraction and geo-location are both in place, the next step is to transform the data to work with structured events rather than clusters of textually similar messages. This will make it possible for example to graph the number of clashes between protesters and security forces in different areas over time, the number of people killed, or to alert when previously unseen types of events are detected in new locations. Current funding is however exhausted and I now rely on my current employer to let me work on projects that I can then integrate into the platform. With independent funding, progress towards this goal would be more direct.
16 people have evaluated this Evaluation Results
How scalable would this idea be across regions and cultures?
|Looks like it’d be easy to spread across multiple regions and cultures|
|This idea could scale but it might need further iteration to make it widely relevant|
|Seems that this idea would best be suited for a single region/population|
Would a lot of resources be required to create a pilot for this idea? (think time, capacity, money, etc)
|This idea looks easy to pilot with minimal resources being invested|
|Feels like this idea could take a moderate amount of resources to pilot|
|Seems like piloting this idea would take a lot of resources|
How suitable is this idea for various challenges on the ground such as lack of internet or mobile access?
|Yep, it feels like it could work easily beyond internet or mobile access|
|Not so sure – it looks like it would require online or mobile connectivity|
|This idea definitely seems to rely on internet or mobile access|
Could this idea put users or others at risk?
|Nope, it looks like everyone would be safe|
|There are some potential concerns, but these could be addressed with further iteration|
|I can imagine some people being put at risk with this idea|
Overall, how do you feel about this concept?
|This idea rocked my world|
|I liked it but preferred others|
|It didn't get me overly excited|