The Challenge

795 followers

How might we gather information from hard-to-access areas to prevent mass violence against civilians? read the brief

Winning idea

CrisisTracker: Real-time Social Media Curation

CrisisTracker is a web platform that extracts situation awareness reports from public tweets during humanitarian disasters. It combines automated processing with crowdsourcing to quickly detect new events and bring together related evidence.

=== OPEN QUESTIONS ===

The proposed open-source system (see below) has been used in practice for conflict monitoring, but several research/design/development challenges remain. if you want to contribute, please post your thoughts below regarding the following questions.

I (Jakob) would also love to brainstorm about these challenges in a Skype call (jakob.rogstadius), in particular if you have decision making or analyst experience related to conflict monitoring or intervention.

What information actually leads to action in the domain of conflict monitoring and prevention?

Neither technology nor information automatically leads to action and there are several cases where atrocities during civil wars have been well known, but no action was taken to prevent them (e.g. Rwanda). I can also imagine cases where a summary of raw reports with limited context or explanation can actually trigger new violence. What information should a system like this provide to be meaningful and to lead to positive change?

Can open-access information management systems improve the safety of regular citizens or help them contribute to peace?

Conflict monitoring and atrocity prevention are traditionally approached from a top-down perspective. The role of information management systems targeted at expert analysts is well established,but can such system also be used to empower bottom-up efforts? Rather than discussing what information should be hidden from the public, is there any information that can help improve individual safety, or promote mindsets that lead to conflict reduction and long-term stability in conflict zones?

What decision making processes should be supported?

I am primarily a software engineer and I need to know more about the specific decisions that are made by decision makers in peacekeeping and conflict monitoring situations. What decisions need to be taken, when, and what information is required to make those decisions? This knowledge is extremely helpful to make design trade-offs and to prioritize different features in the system.

How can a crowd assist decision makers with meaningful analysis?

Evaluation of the system has shown that both volunteers who curate content and decision makers who wish to consume the information would prefer that volunteers work with more complex reasoning tasks, rather than just data annotation. What decisions are frequent and important enough that it would be efficient to offload the required data analysis to a skilled or semi-skilled crowd? How can volunteers sufficiently share their evidence, reasoning and conclusions with decision makers for their work to be trusted?

What quantitative indicators are needed?

Good quantitative indicators (things that can be measured numerically) are required to provide meaningful time series, and to rank content by 'importance'. However, when the raw data is social media content, it is very difficult to extract traditional quantitative indicators such as the number of affected unemployed women in rural areas. Other quantitative metrics such as the number of messages or the number of unique people discussing the event are readily available, but are far less meaningful. It's clear that some form of quantifier needs to be extracted, but what low-hanging fruit should we aim for to still provide helpful time series? A rough estimate of the number of people affected (1s, 10s, 100s, 1000s) per event? Number of new events of type X per day? Number of people discussing any event of type X per day?

What are the ethical implications of a system like this, in particular in conflict situations?

I believe sources are sufficiently protected, but do others agree with me? What if the system collects information that mostly benefits one side in the conflict? Are there any (new) risks that this tool introduces into decision making processes, or does this tool simply require the same skepticism as any other source?


=== CONCEPT DESCRIPTION - CRISIS TRACKER ===

During conflicts in recent years, online social media (mainly Twitter, Facebook and YouTube) has emerged as a means for conflict affected local populations to communicate their experiences to the world. With increasing technology adoption and free access to posted messages, online social media can now be used to leverage the reporting capacity of thousands or millions of people on the ground for large-scale real-time distributed sensing.

The Twitter microblogging service saw 500 million tweets being posted daily in October 2012, by over 200 million active users. Unlike for instance Facebook and SMS, the vast majority of these tweets is shared publicly and can be accessed in real-time though an application programming interface (API). The challenge however is sense-making. With so much content being generated, maintaining overview and history, and detecting patterns and actionable information, requires specialized information management tools.

CrisisTracker is an open-source online webplatform developed primarily by me during my PhD studies, which adds structureto millions of reports already available on Twitter. This additional layer ofstructure helps reduce information overload, making it much easier to use socialmedia as a rich source for real-time situational awareness.

CrisisTracker infers structure by makinguse of the repetition that occurs when multiple people independently reportimpactful events, in two ways. First, the greater the number of people thattalk about an event, the more likely that event is to be of interest to asystem user. This is not a perfect indicator,but with far more information being collected than what can be consumed, havingsuch a metric is critical. Second, the CrisisTracker platform uses an automatedreal-time clustering algorithm to group together tweets that are textually verysimilar. A cluster of messages (a “story”) typically refers to a singlewell-defined event, such as an attack on a protected object, artillery shellingof a location, a bombing, etc. Although individual tweets are both extremelybrief (up to 140 characters) and difficult to verify independently, stories inCrisisTracker capture the event from multiple viewpoints and provide areal-time index of published evidence in the form of images, video and newsarticles.

After reports have been clustered, theplatform uses crowdsourcing techniques to extract structured meta-data (type ofevent, geographic location and named entities) from the stories, which improvesthe quality of search and filtering in the system.

How does your idea gather AND verify information? How does your idea keep those who use it safe?
CrisisTracker currently uses only publicly available information posted on the Twitter microblogging service. Thus, as the information producers publish their reports knowing that the content will be accessible to anyone, they already need to take necessary precautions to not reveal information which they themselves consider sensitive. Unlike for instance Facebook, it is also easy to use Twitter anonymously. CrisisTracker cannot automatically verify information. However, as the system clusters information into stories, it becomes possible to compare different tweets that talk about the same event. Disaster response experts who used the system have described how this makes it much easier to compare different versions of a story to make a more nuanced assessment of a situation, and to compare the available evidence for or against each claim. The idea is not to replace existing conflict monitoring techniques with CrisisTracker, but rather to use the system as a complement. For instance, by providing cheap real-time country-wide monitoring of activities, the system can enable more accurate and earlier allocation of scarce resources such as trained observers and high-resolution satellites. The system also offers visibility into areas where no organizational presence can be maintained on the ground.
How might your idea be designed to scale and spread to help as many people as possible?
As CrisisTracker taps into information that is already being produced by affected populations, the data collection itself is inherently scalable. Detection and prioritizations of stories (tweet clusters) is fully automated, but if the system is not powered by a crowd of volunteer curators, search and filtering is limited to keywords and time. If a pool of 10-100 human curators can be maintained, each working around 30 minutes per day, then the platform allows search and filtering also by geographic location, event type and named entities. Although the system is capable of directing human curators to work on the most important stories, curator availability is an issue during prolonged crisis such as conflicts. I am therefore currently working on extending the platform with supervised machine learning algorithms (using the AIDR tool that I've helped develop at QCRI), so that the system can generalize the human curation behavior into event-specific rules. Such rules can then be applied instantly for each newly detected story to classify information at much greater scale. I also hope to integrate existing algorithms for automatic location extraction, to further reduce the dependence on human curators. Once topic extraction and geo-location are both in place, the next step is to transform the data to work with structured events rather than clusters of textually similar messages. This will make it possible for example to graph the number of clashes between protesters and security forces in different areas over time, the number of people killed, or to alert when previously unseen types of events are detected in new locations. Current funding is however exhausted and I now rely on my current employer to let me work on projects that I can then integrate into the platform. With independent funding, progress towards this goal would be more direct.
How could you begin prototyping this idea in a simple way to begin testing and refining it? Who would use your idea and/or who is using it now? Is your idea technically easy medium or hard to implement?
An early version of the system has been deployed since April 2012 to track Syrian civil war, and is now in daily use by Syria Tracker to complement their network of eyewitnesses and their monitoring of mainstream media. According to Syria Tracker, this is the first system that successfully gives them a sense of overview of the social media space. Qualitative and quantitative evaluation has revealed that the system is capable of directing users’ attention to impactful events within 30 minutes of the first tweet being posted, whereas mainstream media often takes several hours. This is a median time, with instantaneous events such as bombings being detected quicker and armed clashes and political events gathering momentum more slowly. Due to limited availability of human curators, Syria Tracker has however only been using the fully automated features in the system. This is why learning classifiers and increased automation are such important extensions. An incident commander who tested the Syria deployment of the system stated that “I feel very confidently that those reports will come out ahead of CNN and BBC and that they will have the central nuggets of who, what, when, where, why. For an incident commander, it is the difference between learning something in 2-3 hours versus learning it in 6-8.” Furthermore, a GIS expert said that “you can see over a period of time where people are moving, how that relates to conflict areas. Water shortage, or food, you can almost anticipate where needs are going to be based on what you are seeing.” For more information about this evaluation, please see http://hci.uma.pt/~jakob/files/Rogstadius_2013_CrisisTracker_Crowdsourced_Social_Media_Curation_for_Disaster_Awareness.pdf Try the system at http://ufn.virtues.fi/crisistracker/
How is your idea adapted for conditions in hard-to-access areas, such as lack of internet and mobile access? Can users adopt it without much behavior change?
The system is only applicable when affected populations and the global community are actively using Twitter to discuss the conflict, as has been the case in particular in recent conflicts in North Africa and the Middle East. If Twitter has established a market presence, no further change of behavior is required in the monitored community. Conversely, any necessary marketing required to make the system work is already being handled indirectly by a leading major corporation in the social media space. The technology behind CrisisTracker could likely be used with content from other online social networks or even SMS data, but I am not currently aware of any comparable source of openly available reports.

Evaluation results

1

How scalable would this idea be across regions and cultures?

Looks like it’d be easy to spread across multiple regions and cultures
This idea could scale but it might need further iteration to make it widely relevant
Seems that this idea would best be suited for a single region/population
2

Would a lot of resources be required to create a pilot for this idea? (think time, capacity, money, etc)

This idea looks easy to pilot with minimal resources being invested
Feels like this idea could take a moderate amount of resources to pilot
Seems like piloting this idea would take a lot of resources
3

How suitable is this idea for various challenges on the ground such as lack of internet or mobile access?

Yep, it feels like it could work easily beyond internet or mobile access
Not so sure – it looks like it would require online or mobile connectivity
This idea definitely seems to rely on internet or mobile access
4

Could this idea put users or others at risk?

Nope, it looks like everyone would be safe
There are some potential concerns, but these could be addressed with further iteration
I can imagine some people being put at risk with this idea
5

Overall, how do you feel about this concept?

This idea rocked my world
I liked it but preferred others
It didn't get me overly excited

Team

Christophe's profile photo

Christophe Billen

A note from Jakob about Christophe's participation in this Team:
I like your comment about a system that drives the collection of new information and I would love to hear more of your thoughts on this topic.

Comments

Join the conversation and post a comment.

Sidd Maini

February 12, 2014, 19:42PM
The concept sounds good. The idea is great but needs some more thought. It seems to have some predispositions that may not be entirely true. Are you presuming that people during time of violence will have access to a desktop or even a technology such as a smart phone in these areas of violence? Who is going to end up using it?

There is too much information. I would recommend funneling down the idea into something as robust as Twitter. Twitter is simple. It allows you to Tweet using a phone. In your concept, however, I fail to see what a real-user would actually do. Do you have real use-cases and users who have tested your system?

Do you think this tracker can be adapted for any crisis such as Human Trafficking?

Great work though! This is an amazing start.

Nat Manning

December 12, 2013, 03:48AM
Great stuff. I am curious whether you looked at Ushahidi's SwiftRiver tool at all? https://github.com/ushahidi/SwiftRiver or at http://next.swiftapp.com/

Do you pull from the firehose or from the free Twitter API?
Jakob's profile photo

Jakob Rogstadius

December 12, 2013, 05:19AM
Hi Nat. Thanks for your comment. I have only used CrisisTracker with Twitter's free API, which lets you specify keywords to track. To my knowledge it's not well documented exactly how much of the total content is returned this way, but from what I've been able to gather, it's almost everything that matches the search.

Regarding SwiftRiver, I've been curious about the project ever since it started, but I have so far not come across any serious deployment or formal evaluation. If you know of any, I would be interested in a link.

Karoline K

May 14, 2013, 11:24AM
Hi. This concept is great, and links nicely with our 'People's Radio' (PR) concept, which is radios made up of 'spoken tweets' - a somewhat analogue version of social media. I've linked to you and 'raise a red flag' in the concept description, as the data PR will gather could easily feed into both crisis tracker and raise a red flag. All your discussions on curation are really informed and interesting, there have been similar but perhaps less in-depth discussions on our side.The fact that CT "assigns a unique number to the account tweeting the information and tracks its credibility over time, but never publishes the name or the location of the person [or entity] tweeting" sounds like a good way of verifying information, I've thought about incorporating it into PRs as one of our main challenges is making sure content is relevant and informed. Just letting you know about the wonderful interweaving of concepts that's going on :-)
Ryan's profile photo

Ryan Donnell

May 12, 2013, 02:36AM
Hi Jakob,

I encourage the use of social in any means but do you think that curating this data will be more of an after effect than real-time? Unless there are people on the ground in those areas that can make a difference, I feel this might just be a tool for argument's sake rather than one of action.

However, I liked the use of twitter during the arab spring uprisings. I feel to get who we need to react to this might be too late. I am kind of looking at Syria as an example, atrocities could have taken place there, twitter is going back and forth. Analysis from the CrisisTracker might show the after effects of the mass attack rather than preventing it.
Jakob's profile photo

Jakob Rogstadius

May 12, 2013, 05:48AM
Hi Ryan,

You raise a very valid point. In my experience, social media in general and Twitter in particular tends to focus on the immediate present. Based on the research of me and many others, I also believe that for some types of real-time information, Twitter can be a very fast, accurate and relatively rich source of live reports, as well as an accurate historical index of when different pieces of information became available.

Regarding content curation, you also raise a fair point that human-based curation is indeed always associated with some level of latency. In the current version of CrisisTracker, this affects meta-data extraction, but keyword search and event detection (report clustering) are essentially immediate as these are automated processes. Current development efforts are therefore focused on shifting the workload of human curators from direct annotation of content, to training different types of machine classifiers. If these efforts work out, the system will be able to generate also meta-data in real-time.

Your question thus ultimately boils down to the question of whether real-time situational awareness (SA) enables action or not. I believe SA is one fundamental component in most decision making, but I would greatly appreciate to get more feedback on the specific real-time information that is needed to make response decisions in conflict prevention. Perhaps some vital information isn't easily found on Twitter? Perhaps the information is there, but the user interface doesn't quite make it possible to find some particular content of interest quickly enough? Perhaps much information is useful, but even more is useless, so better classifiers are needed to filter out the noise?
Ryan's profile photo

Ryan Donnell

May 12, 2013, 14:06PM
Hmm yes, it boils down to those using this program and having the situation awareness will be able to take action. Will there be the ability to curate across platforms if Twitter is not the most popular? I am thinking of places like the PRC where Twitter is outlawed.
Jakob's profile photo

Jakob Rogstadius

May 12, 2013, 14:17PM
Others have asked similar questions about collecting reports from other sources. While it should be technically possible, I think in practice it's not going to happen unless there is a significant increase in funding. It's an open source project though, so if others want to re-purpose the system, they are welcome to do so.
Souraya's profile photo

Souraya Tafrah

May 08, 2013, 06:50AM
WRT Q #5 "What are the ethical implications of a system like this, in particular in conflict situations?". At Syria Tracker (https://syriatracker.crowdmap.com) - a project of Humanitarian Tracker (http://www.humanitariantracker.org), we have used the information to cover all sides. Many of the tweets or social media posts detected by CT cover all sides of the conflict and with very little noise in the data. We have been able to also identify "propaganda" accounts, these look like real accounts, however they have very suspicious behavior. So in essence, CT has helped us further weed out the noise and misinformation. The neat thing about CT is that it's language agnostic, this is a great help esp with mining arabic and french tweets which seem to dominate Twitter-covered stories about Syria. As for the safety and privacy of the reports, CT assigns a unique number to the account tweeting the information and tracks its credibility over time, but never publishes the name or the location of the person [or entity] tweeting. We found this to also be incredibly helpful as it protects the identity of the eyewitness reporter, for example. While there remain selection bias wrt population coverage and those who are tweeting vs those who aren't (in particular inside Syria), CT remains to be a solid source for verifying and validating other sources we mine, such as the news or blogs, in addition to augmenting Syria Tracker's eyewitness reports.
Jakob's profile photo

Jakob Rogstadius

May 08, 2013, 07:28AM
Thanks,
Just to clarify, CrisisTracker doesn't yet have automated credibility ranking of sources. In practice, we have managed to keep spam down by running separate analysis based on the clusters in CrisisTracker to identify and blacklist around 100 spammer accounts. I hope that this analysis can be automated later, as the clustering really helps in the process. Essentially it becomes possible to start with a small seed of identified spam stories and from there broaden the search to include other accounts that have shared the same content as the sources of the first few spam stories.
Christophe's profile photo

Christophe Billen

May 08, 2013, 08:26AM
Sounds excellent. Could these curration methods be applied to SMS instead of Tweets?
Jakob's profile photo

Jakob Rogstadius

May 08, 2013, 08:35AM
It should be possible from a technical standpoint, but I don't know how to get it to work in practice. While tweets can be publicly mined, SMS is normally considered confidential and/or owned by the network providers. This means that there is no public chatter to tap into, so it requires active participation from sources in the same way that Ushahidi does (for SMS). The Syria deployment of CrisisTracker works with hundreds of thousands of daily tweets, so it's likely that if SMS is used as the main reporting technology, information coverage would consequently drop by a couple of orders of magnitude.
Christophe's profile photo

Christophe Billen

May 08, 2013, 08:45AM
Yes, the idea is to have the SMS sent to the platform on top of other channels of communications such as Twitter. SMS provide a mean to communicate back and forth to areas where Twitter is not widely used since it requires an Internet access. The SMS content could be analyzed for relevance and credibility versus other information that the platform deals with, such as Tweets, but also media reports crawled via RSS for example, etc. Typically from an analysis perpective, more more information from different sources is a good thing.
Jakob's profile photo

Jakob Rogstadius

May 08, 2013, 09:00AM
Souraya Tafrah:
Do you feel that CrisisTracker's revealing of Twitter handles is problematic? Twitter accounts (unlike for instance Facebook accounts) are often kept anonymous in the sense that the person behind the account does not reveal their true identity. I would hope that people in conflict situations also take measures to protect themselves if necessary, since Twitter itself already provides search capabilities both for users and tweets.

In discussions with a couple of crisis managers in the past, I have been told both that "sources of information must be listed for verification purposes" and that "affected people must be kept anonymous to not expose them to risks". These two goals are obviously conflicting for citizen reporting, so CrisisTracker's take on this is to trust that sources have made their own risk-benefit assessment when they decide to post certain content publicly on Twitter.

Logically, this assumption of self-censorship implies that reports that by their very nature are sensitive will be difficult to gather from Twitter (perhaps anonymized SMS reports will be better for these?). I also don't think I have ever seen a first hand report of rape, or of people saying their children were injured, or in general anything that is stigmatized to admit in society. In cases where people are injured and post calls for help, I would assume that their risk-benefit analysis at that very moment calls for maximum publicity rather than anonymity.

What is your take on the issue?
Souraya's profile photo

Souraya Tafrah

May 08, 2013, 06:10AM
Crisis Tracker is scalable for use during mass disasters and conflicts for tracking events as they unfold, which makes it unique compared to other similar platforms such as Sahana (http://sahanafoundation.org) and VirtualAgility OPS Center (VOC) (http://www.virtualagility.com). These two systems often integrate raw social media feeds, but lack capabilities for distilling and handling situations when activity is exceptionally high; b) Ushahidi which its effectiveness depends entirely on the size, coordination and motivation of crowds which adapts well to needs of specific disasters, but is difficult to scale to match information inflow rates during very large events; c) Twitcident (http://twitcident.com) which works only with geo-tagged tweets (~1 percent of all posted messages) employs classification algorithms (spoken language-specific and requires training for every time a new concept is introduced) to extract situation awareness information during small-scale crisis response, such as music festivals or factory fires. This system, however, is not built to monitor large and complex events with multiple parallel storylines or for emerging events or threats, such as a novel disease outbreak; and d) EMM NewsBrief (http://emm.newsbrief.eu) mines and clusters mainstream news media from predetermined sources in a wide range of spoken languages, with new summaries updated every ten minutes, but has not been extended to handle social media.

We, at Syria Tracker (https://syriatracker.crowdmap.com) - a project of Humanitarian Tracker (http://www.humanitariantracker.org), have been collaborating with Crisis Tracker on mining social media which has helped us 1) corroborate with eyewitness reports and 2) maintain timely situation awareness for the events in Syria. To date, Syria Tracker has documented over 65,000 verified civilian deaths in Syria, and Crisis Tracker has been instrumental to the verification process.

Meena Kadri

May 08, 2013, 21:50PM
Cheers for chiming with such comprehensive insights Souraya. Great reading about all the amazing initiatives via Humanitarian Tracker. We hope you might find some time to give feedback on other shortlisted concepts here as well: http://bit.ly/endatrocity-test Your valuable perspectives are welcome across our challenge.

Karoline K

November 13, 2013, 11:09AM
Hi guys Just wanted to pass on some information about the opportunity to apply for small grants through humanity united. They've got a pool designed to facilitate innovation and scale in humanitarian and emergency assistance worldwide. The small grants given would range up to £20,000 (or approx. US$32,000). More info about the fund and application process here http://bit.ly/17T9Nzf It's looks like a great opportunity to bring some of the ideas in here to life. Unfortunately, I'm still in school, and unable to take my idea 'People's Radio' further or apply for grants, but anyone from the community who might want to are super welcome to do so. Jakob, if you're in a similar position, you might want to reach out to members of your virtual team or others you know to see if they're interested in taking the idea further. Exciting!
Cheers, Karoline
Christophe's profile photo

Christophe Billen

May 06, 2013, 07:52AM
Hi Jacob,
you might wish to look at my concept once more ( http://www.openideo.com/open/usaid-humanity-united/ideas/how-to-get-relevant-information-and-verify-it-with-low-cost-technology/) as I updated it quite a bit and it might provide some ideas and ways to upgrade your platform. As for your call to speak to an analyst or people who do conflict monitoring, well I might be in the position to help as I am an analyst and have been working in conflict settings and have been monitoring them for years, hence I may have an idea or two how to monitor these with analysis in mind.
Cheers,
Christophe
Jakob's profile photo

Jakob Rogstadius

May 08, 2013, 05:51AM
Hi Christophe,
Your concept seems to be more about finding different ideas and working technologies that together can be used in a holistic approach to information gathering. The goal of the CrisisTracker project is to solve the technical and design challenges involved with one of those components, so the scope is quite different.

With regards to the five questions I posted at the top of this page, would it be possible for you to share your thoughts here on any of them? Based on your LinkedIn profile, I am guessing that the questions that are closest to your professional background are what information leads to positive action in conflict prevention, and what 'cheap' quantitative indicators we can use to provide meaningful time trends.

Would it also be possible for you to share some of the mapping and analysis products you have produced at ICC, or are those confidential?
Christophe's profile photo

Christophe Billen

May 08, 2013, 08:23AM
Hi Jacob. In essence, yes you are right, although, the primary idea is to communicate via SMS back with the provider of information to get it structured and to get in touch with other contacts on the ground to help verify it. My concept is about possible methods that can be used to achieve just that. And I'since I'm not a software engineer ;-) but a social scientist with some IT awareness, I do have to rely on people like you with this kind of knowledge to get it implemented.
Also, when developping my concept, I tought it would be useful to link with other concepts and ideas, including yours, as I believe from what I read that your platform could be adapted to manage and analyse SMS messages to and from people in hard to access loactions. And I'm sure you would agree that ideally a platform should be holistic in the ways it gather information. In this case, SMS communications back and forth would just be an additional communication channel.

As for the questions above, I have some ideas (some of which I alreday discussed in my concept) I'm happy to share with you. We can discuss on Skype over the WE.

Wishing you a very nice day.
Christophe's profile photo

Christophe Billen

May 08, 2013, 08:24AM
Sorry, keep mispelling your name Jakob (I'm a French native speaker if that can serve as an excuse). Apologies.
Christophe's profile photo

Christophe Billen

May 01, 2013, 08:23AM
Hi Jacob. Congrats on being shortlisted.

Directly to the point, as I have little time right now. Speaking of making use of other communication channels, and especially "in regions where connectivity is low even before the crisis and information is scarce, the value of any system built primarily to reduce information overload would be questionable" you may want to look at integrating my concept into your platform.

Have a look here http://www.openideo.com/open/usaid-humanity-united/ideas/how-to-get-relevant-information-and-verify-it-with-low-cost-technology/gallery/relevant-and-verified-infomation-v1-billen-290313-1.pdf/ and here http://www.openideo.com/open/usaid-humanity-united/ideas/how-to-get-relevant-information-and-verify-it-with-low-cost-technology/ and let me know what you think.

Also, I think that Annie's concept might be worth looking at : http://www.openideo.com/open/usaid-humanity-united/ideas/thread-a-way-to-thread-together-information-to-take-appropoiate-action

As for outlets, what about adding people's radio on top of current web-based interfaces? http://www.openideo.com/open/usaid-humanity-united/ideas/people-s-radio

Cheers,
Christophe
Jakob's profile photo

Jakob Rogstadius

May 02, 2013, 05:54AM
I have been pondering ideas similar to yours, where a system that collects information basically tries to fill in a template of structured information and automatically requests missing details from people who are likely to be able to answer. The same concept can also be turned into a cheap means for primary data collection. A system could fairly easily be built to let an analyst specify a geographic region of interest, so that the system can automatically contact people who have posted geolocated tweets in that area to ask them to fill in an online survey regarding the conditions in their local surroundings.

It's tricky to get it right though. Such a system runs the risk of becoming a spambot unless its information extraction performance is very high. I suspect though that if the sender has enough credibility (e.g. the Red Cross or OCHA), then public acceptance could be fairly high.
Christophe's profile photo

Christophe Billen

May 06, 2013, 08:02AM
Hi Jacob,
I completely agree that those running the platform should be legtimate and neutral actors such as ICRC or the UN. Another potential actor that may achieve this credibility would be AVAAZ, but I'm not sure if they have the expertise and experience for this, although they would sure have the legitimacy. Others again could be HRW or Amnesty International.
As to prevent this platform to turm into a spambot, you would only send verification requests to registered users on your platform using geofencing for example to limit messages to registered users in a particular area. You would of course provide users with an easy way to register and unregister on your platform (the latter might come in handy also from a security perspective).
As mentioned in a comment above, you may wish to check my updated concept as it may answer some more issues.
Cheers,
Christophe
Jakob's profile photo

Jakob Rogstadius

May 08, 2013, 06:04AM
The idea I had in mind (though it may be a bad idea) was to have a system that can help with 'aggressive' primary data collection in the very early comprehension stages of a disaster (the first 24-72 hours). Its use case would be similar to dispatching people on the ground to conduct situational assessment surveys with members of the affected community. In the traditional survey procedure the contact with respondents is always initiated by the collecting party, so I believe it would be difficult to get respondents to initiate the first contact (a registration step).

Since we can ask Twitter for (a ~1% subset of) all accounts that have recently posted information in a geographic area, the idea was that a system can be built that approaches community members to ask for participation in a survey, instead of requiring a human to be sent out. Like with most social media-based technologies, this would probably not be a replacement for other data collection methods, but rather a complement with very different cost-benefit trade-offs.
Christophe's profile photo

Christophe Billen

May 08, 2013, 08:34AM
I Jakob.
Not per se. It depends on your sensitization capabilities. As explained in my concept, you can think of using enabler cards you could airdrop over crisis zones, radio sensitization campaigns for people to SMS or call a free number, etc. When it comes to verification, you could also send SMS messages to people in a certain area (using geofencing for ex.) with a request for assistance. Of course, this should be tailored to the situational political context and associated security threats and risks. But after a Tsunami or an earthquake, if the info is disaster related, such as where people are still trapped, etc. it could probably be done without having people to undergo a formal registration process. Happy to discuss this further.

Meena Kadri

April 30, 2013, 21:34PM
We thought you might also like to check out this idea: http://www.openideo.com/open/usaid-humanity-united/ideas/crisis-mapping-and-conflict-network-analysis/ and reach out for collaboration as you proceed through our Prototyping phase.

Hanna

April 30, 2013, 19:55PM
Jakob-

As many posts in Facebook and Twitter include media (both video and pictures), is there a way to view these materials while in the CrisisTracker platform? Or are you linking out to a specific user's page/profile to view the full "post;" thus, leaving the CrisisTracker platform?
Jakob's profile photo

Jakob Rogstadius

May 01, 2013, 05:41AM
Hi Hanna. So far the platform only links out, but it's on the to-do list to include frequently shared media content in the stories.

OpenIDEO

April 23, 2013, 22:23PM
Congrats on being shortlisted for our Atrocity Prevention Challenge, Jakob!

Our challenge sponsors loved how your idea verifies by collecting similar stories. One area to consider is how does this idea work in regions with less connectivity to start with: e.g. lacking smartphones or internet access. Might this work well in concert with Speak to Tweet http://www.openideo.com/open/usaid-humanity-united/inspiration/speak-to-tweet-/ or People's Radio http://www.openideo.com/open/usaid-humanity-united/ideas/people-s-radio/? Also, is the preference to move towards machine processing, or are there ways for this to support and complement the role of human curators?

Read more on how to get involved with prototyping and refinement: http://bit.ly/oi_refine And here's some tips on prototyping specifically for this challenge: http://bit.ly/endatrocity-proto Ready, steady, refine!
Jakob's profile photo

Jakob Rogstadius

April 26, 2013, 18:45PM
The current version of CrisisTracker has been developed specifically to address the challenge of information overload in regions of high connectivity. The system relies on volume and repetition to identify important information in torrents of unstructured content, and the number of independent sources ensures variety in the coverage. In regions where connectivity is low even before the crisis and information is scarce, the value of any system built primarily to reduce information overload would be questionable.

That said, as the project moves forward the goal is to have a system that can maintain a real-time database of events, each defined by its event type(s), time and geographic location. For each event there can be a range of evidence, similar the current stories, but the important difference is that repetition no longer is critical to infer information structure. With structured event data in place, the next step is to build visualizations that show how event types are distributed in space and time, and alert when new event types are detected.

In an event-based data model it is also far easier to fuse data from different communication channels. The idea to combine voice-based reporting with crowd-sourced annotation and/or automated speech-to-text is intriguing, and certainly something that will be kept in mind as the project proceeds. I think it would be particularly interesting if it can be combined with automated visualization.

Finally, the preference is to move towards a system where humans set the standard for content curation, but where the system quickly learns to generalize and automate the curation behavior. There are three reasons for this. First, compared to other information domains, each humanitarian crisis is relatively unique. Therefore unsupervised machine learning algorithms trained for one event give very poor performance if used in future events. Second, human volunteer curators are available with short notice through for instance the Stand-By Task Force. Human curation can be very accurate, but burn-out is a significant problem and there is no way to sustain large work-forces for crises that last for weeks, months or even years. Third, no matter how many volunteers are recruited, they can never keep up with the rate at which social media content is generated. Even though the clustering in CrisisTracker reduces information inflow rates by several orders of magnitude, humans can still only keep up with around 1% of the stories. By learning from the human curators, the system can be both accurate and scalable.

Meena Kadri

April 29, 2013, 22:14PM
Fascinating detail yet again, Jakob. Be sure to add the most important details and any collaborative builds from discussion here to your actual post ahead of our Evaluation phase. We've just extended the Prototyping phase by a week – so we're looking forward to seeing how folks ideas iterate over that time.
Jakob's profile photo

Jakob Rogstadius

May 01, 2013, 05:56AM
I will make sure to update the post closer to the deadline. But I wonder, since this idea already includes a prototype, what should the prototyping phase consist of in this case? It's a quite complex technical research project, so it's not really feasible to implement the additional features suggested here in just a few weeks time.

Meena Kadri

May 06, 2013, 02:55AM
We hear you Jakob – and know that you've been prototyping in earnest for sometime now. Given we're a collaborative community, we're digging that you've been joining conversations on other shortlisted ideas with your insightful perspectives and hope you keep up with that.

Are there other mapping domain specialists you could reach out to with the idea of getting some collaborative input here? We're sure you know many of the folk at Ushahidi, etc personally – but are their others in their networks you could invite to this conversation to get feedback (perhaps via social media given that's central to your concept?) One of the challenges for NGO's, social enterprises, etc is around sharing perspectives, skills and insights. We'd love to think that we might provide a space on OpenIDEO for these conversations to start and for connections to form.
Jakob's profile photo

Jakob Rogstadius

May 06, 2013, 07:37AM
I only have one contacts (I think) who has any actual experience in conflict intervention or analysis, which is why I have so many unanswered questions regarding decision making processes and indicators.

I just updated the main post with a list of open questions (top of the page), so anyone reading this with a relevant background can greatly contribute to this project by sharing their thoughts.

Meena Kadri

May 06, 2013, 07:57AM
Great idea on your open questions & suggestion of Skype calls to tease out further. I've started putting the word out via our OpenIDEO Twitter account, using your #CrisisMapper hashtag. Do let us know if you end up getting contacted by anyone and follow up on Skype – always good to know if we get traction from this kind of outreach.
Christophe's profile photo

Christophe Billen

May 06, 2013, 08:04AM
Feel free to contact me as I might be able to help since I have the experience you are looking for, haveing worked in conflict settings and having monitored them as well. My skype ID: cryptosaure

Meena Kadri

May 06, 2013, 08:17AM
Way to go Christophe!

Meena Kadri

May 06, 2013, 22:09PM
And Jakob – you can learn more about Christophe's professional experience here: http://www.billen.ws/cv_billen.html (I did a bit of Googling to find that. Tip: it helps with collaboration for folks to add links like this to their OpenIDEO profile pages)
Christophe's profile photo

Christophe Billen

May 07, 2013, 07:21AM
Thanks Meena, but that's pretty old and needs updating ;-) Here is a more up to date CV: http://nl.linkedin.com/in/christophebillen. Will add it to my profile page as kindly recommended.

Meena Kadri

May 07, 2013, 07:50AM
Oops – your link is broken. This one worked for me: http://www.linkedin.com/in/christophebillen Great having you onboard with our collaborative community!
Christophe's profile photo

Christophe Billen

May 07, 2013, 08:23AM
Ok thanks. I've updated my profile page with a working link. Sorry about that.

Joshua Bress

April 23, 2013, 02:33AM
Jakob, I really enjoyed reading your proposal and I like that this proposal poses no additional risk to the user aside from decisions the user already made. I wonder whether twitter would be used more in conflicts outside of the Middle East and North Africa if people knew it had the chance of being seen by the public. How does it account for foreign languages?
Jakob's profile photo

Jakob Rogstadius

April 23, 2013, 05:31AM
Thank you Joshua. Currently none of the processing in the system is language-specific, but I would assume clustering performance drops for languages that have very complex grammar. Basically two tweets are considered similar if they have a lot of words in common, so if there are many forms of the same word, then those forms will all be considered separate unless the system is extended with a stemmer (an algorithm for reducing words to their common base form) for that specific language.

The goal is to stay away from English-only algorithms as the processing is increasingly automated, but I am unsure whether that will be possible for inferring geographic locations from text. If anyone knows of high-performing language-independent unsupervised or supervised algorithms for geo-inferencing, I would be very interested in hearing about them.

Arjan Tupan

April 21, 2013, 07:46AM
Impressive, Jakob!
Regarding the need for a Twitter-presence in the area affected, and with that also the need for mobile data networks to be up and running, which both are not always the case, I was wondering the following:
would it be possible for a curating team to be outside a certain area, and feed the system with tweets that have a coded geolocation in them, so that the system takes the tweeted location, instead of the geo-tag of the tweet itself?
To explain this a bit, consider the following scenario:
Something is happening in Syria, but at the time, all internet traffic is blocked. Through other channels, a curation team located in Switzerland has a good view of what's happening. They relay reports they get via a specific twitter account, using the hash-tag #locationDamascus. Their tweets get into the system, and appear on the map in Damascus.

What I try to achieve here, is related to my own Idea (WINTbase) that looks into doing something similar to CrisisTracker, but less real-time and by looking at information available around the area of conflict, when communication from that area is nearly impossible, internet and mobile networks are down etcetera.
Jakob's profile photo

Jakob Rogstadius

April 21, 2013, 08:04AM
Regardless of if mobile data networks are available, it is rare that more than 1% of tweets are geotagged. Even the data I have for the Boston Marathon bombings indicates the ratio was around 0.5% for that event.

Geo-filters are supported, but the primary way that we have used CrisisTracker for data collection is to filter the Twitter stream by keywords. These are insensitive to location, so as long as information trickles out of an affected location and somehow makes it to Twitter, it will be picked up. The main difference is message volume (which does affect quality of clustering) and timeliness.

Currently curators using the system are instructed to geolocate the content of the messages rather than their origin, and this practice will continue also as the platform is extended with automated extraction of structured location information.

That said, it is not yet clear to me if Twitter has any advantages as an information source in cases when telecommunications are completely unavailable. It is possible that in these cases information will only make it onto Twitter after it is already known through other channels, but I believe further testing is required to know this for sure. I guess it depends a lot on how large the area is where the disruption occurs, and Twitter keeps surprising me with how resilient it is to technology disruptions.
Anonymous's profile photo

Anonymous Conflict Mapper

May 15, 2013, 22:41PM
Hi Jakob,

I'd like to add a response to your second question (Can open-access information management systems improve the safety of regular citizens or help them contribute to peace?)...

I do think that open access information systems can improve the safety of regular citizens, but I would be very cautious as to what information is shared via the platform as more information can just as easily lead to more violence as prevent it.

With respect to what information can promote mindsets that lead to conflict reduction and long-term stability, I think you're onto something there... I've been doing a fair amount of research into conflict networks and how many online forums isolate individuals and prevent them from ever seeing "the other." If an information system made available to people in a conflict zone were truly diverse, I think it would have an enormously positive effect on the conflict and prospects for peace in the future. That is the type of bottom-up result I think you can expect from such a system.

Hope that helps,

Anon.
Jakob's profile photo

Jakob Rogstadius

May 16, 2013, 05:39AM
Hi Anon,

Thank you for joining the conversation! Can you elaborate a little on what information you think can lead to increased safety, with low risk that it leads to more violence?

Also, can you clarify what you mean by that if an information system was "truly diverse", then it could have positive effects? Diversity in terms of picking up a wide range of viewpoints for each detected events? Diversity in detecting "bad" actions by each side in the conflict, and how those actions affect the lives of civillians? Diversity in the user base? So far I haven't figured out any good way to address why-questions with information extraction from Twitter, but the collected reports could still act as discussion points for forum discussions within the platform.
Anonymous's profile photo

Anonymous Conflict Mapper

May 16, 2013, 16:42PM
Sure thing...

I think the best way to illustrate my concerns would be to give an example... After a lot of work, I managed to find the locations of a large number of activist videos uploaded during the Syrian conflict (some from geotags, most from locations referenced and landmarks visible in the videos). Putting all this information together, it became very clear that these videos were originating from a few locations. It showed where the protests were (by neighborhood), where armed groups were forming, and had a lot of information related to where the activists were as well. This type of information in the hands of the regime would be a map of targets. Similarly, information about roadblocks, troop deployments, and the like could lead to further violence.

Information that would lead to increased safety, however, would be information related to conflict events, water or food shortages, and the like. This type of information would improve people's knowledge of what's happening around them (in the absence of free media), and would enable them to respond accordingly.

My comments regarding diversity were with respect to diversity of viewpoints available on the server. Because the algorithms used by Google and Twitter are designed to direct people to other people or things that are similar to themselves, most individuals aren't exposed to "the other" in conflict zones. While I'm still analyzing the effects of this phenomenon where I see it taking place, my initial assumption is that it leads to further entrenchment. Your system, by showing users a diverse array of conversation clusters would help prevent people from becoming stuck in their own cluster of like-minded people. *Of course, a system that draws upon only those conversations that take place online is inherently biased, but as the power of online conversations grow, recognizing and responding to the negative externalities of such a forum becomes even more important.
Jakob's profile photo

Jakob Rogstadius

May 18, 2013, 18:35PM
Thanks.

If a software system is capable of collecting both information that can be used for good and for bad, mostly depending on configuration, do you see a problem with it being made available as open source? I often get told that certain information should be kept hidden from the public, often for good reasons. However, no matter how many security barriers one adds to an information management system that uses publicly available sources, these efforts are basically moot as long as anyone can deploy their own instance and run their own fine-tuned collection on their own servers.

Karoline K

November 13, 2013, 11:10AM
Hi guys Just wanted to pass on some information about the opportunity to apply for small grants through humanity united. They've got a pool designed to facilitate innovation and scale in humanitarian and emergency assistance worldwide. The small grants given would range up to £20,000 (or approx. US$32,000). More info about the fund and application process here http://bit.ly/17T9Nzf It's looks like a great opportunity to bring some of the ideas in here to life. Unfortunately, I'm still in school, and unable to take my idea 'People's Radio' further or apply for grants, but anyone from the community who might want to are super welcome to do so. Jakob, if you're in a similar position, you might want to reach out to members of your virtual team or otheres you know to see if they're interested in taking the idea further. Exciting!
Cheers, Karoline

Meena Kadri

April 19, 2013, 20:43PM
Really comprehensive stuff, Jakob! During our OpenIDEO Ideas phase we hope that our community will collaborate to strengthen and build ideas, together. Do you have any challenges or areas of opportunity for Crisis Tracker that you'd like to share to help folks understand where they might join the conversation? Perhaps you'd like to add those in at the end of your post description. (Tip: to edit your post, hit the Update Entry button on the right) Looking forward to seeing more of you on conversations across this challenge...
Jakob's profile photo

Jakob Rogstadius

April 20, 2013, 11:27AM
I would love to hear ideas from others on this concept! First of all, I want to make clear that I am a computer scientist by training. I thus lack an in-depth understanding of the conflict management domain, although I have been reading about disaster management in general for the past few years.

A big question I have is what information actually leads to action in the domain of conflict monitoring and prevention? Neither technology nor information automatically leads to action and there are several cases where atrocities during civil wars have been well known, but no action was taken to prevent them (e.g. Rwanda). I can also imagine cases where a summary of raw reports with limited context or explanation can actually trigger new violence. What information should a system like this provide to be meaningful and to lead to positive change?

Second, what are the ethical implications of a system like this, in particular in conflict situations? I believe sources are sufficiently protected, but do others agree with me? What if the system collects information that mostly benefits one side in the conflict? Are there any (new) risks that this tool introduces into decision making processes, or does this tool simply require the same skepticism as any other source?

Finally, as I am primarily a software engineer, I would like to know more about the specific decisions that are made by decision makers in peacekeeping and conflict monitoring situations. What decisions need to be taken, when, and what information is required to make those decisions? This knowledge is extremely helpful to make design tradeoffs and to prioritize different features in the system.

Pranati

April 20, 2013, 14:56PM
A very comprehensive article indeed, thank you for sharing. I have been reading about live mapping during crisis for some time now and I really enjoyed reading about this too.
Taking a queue from your article I have a few questions. One is I am interested in knowing about means of verifying online information particularly in a hostile environment. Secondly can you tell me about few of the most effective real time clustering algorithms available or the one that you have used (If you dont mind of course)...

Pranati

April 20, 2013, 14:56PM
A very comprehensive article indeed, thank you for sharing. I have been reading about live mapping during crisis for some time now and I really enjoyed reading about this too.
Taking a queue from your article I have a few questions. One is I am interested in knowing about means of verifying online information particularly in a hostile environment. Secondly can you tell me about few of the most effective real time clustering algorithms available or the one that you have used (If you dont mind of course)...
Jakob's profile photo

Jakob Rogstadius

April 21, 2013, 06:20AM
Hi Pranati. Regarding verification of information, I believe Syria Tracker has used the tool in the best way possible. They do not use any single source for verification, but rather triangulate reports across eyewitness reports from trusted sources on the ground, social media and mainstream media. Neither of these channels is good enough on its own, as trusted eyewitnesses are too few to cover a country, social media is brief and a bit sensational, and mainstream media is slow and has selective coverage. They also try to verify reports on a cause and effect basis, for instance by matching YouTube videos of missile launches with reports of impacts near the alleged time and place of the launch.

CrisisTracker's role in this is mainly to
1) provide cheap coverage of public citizen communication at country scale, and direct attention to current central topics;
2) bring together scattered reports about the same event, to have a single (or a few) cluster(s) that forms a constantly updating index of the most shared images, videos, news articles and views from the different sides in a conflict.
3) maintain history of social media communication, to enable triangulation also of past events. This is difficult to do using Twitter's own search, which is limited to recent content.

Regarding clustering algorithms, CrisisTracker uses Locality Sensitive Hashing (LSH), which is a very fast algorithm for kNN clustering in high-dimensional space. Words in the tweets are used as features, weighted by their inverse global frequency, with dynamic stop-word removal also based on frequency. There is more information in the attached PDF and the references, and feel free to drop me an email if you want more information about how the algorithm was implemented.

Chris S

April 21, 2013, 21:46PM
Wow! This is an incredible piece of technology that you have developed, Jakob! I really enjoyed reading your proposal.

“First of all, I want to make clear that I am a computer scientist by training. I thus lack an in-depth understanding of the conflict management domain”

Don’t sell yourself short! You pose incredibly thought-provoking questions that are on-point, provocative, and poignant! You've already developed and deployed your technology. I found these quotes to be especially interesting:

“For an incident commander, it is the difference between learning something in 2-3 hours versus learning it in 6-8”

“You can see over a period of time where people are moving, how that relates to conflict areas. Water shortage, or food, you can almost anticipate where needs are going to be based on what you are seeing.”

You put forth an incredibly thoughtful analysis of this tool’s impact.

“What information actually leads to action in the domain of conflict monitoring and prevention?”

This is a great question! There is a large body of literature surrounding ethno-sectarian conflict and humanitarian intervention. Why states choose to intervene in some humanitarian crises and not others has been a central question in contemporary international relations theory. Your question about the type of data that states use to intervene in humanitarian crises is novel, and would make for a great dissertation!

“I can also imagine cases where a summary of raw reports with limited context or explanation can actually trigger new violence. What information should a system like this provide to be meaningful and to lead to positive change?”

This is a very nuanced and empathetic question! I look forward to discussing this question in more detail during the prototyping phase.

"What are the ethical implications of a system like this, in particular in conflict situations? What if the system collects information that mostly benefits one side in the conflict? Are there any (new) risks that this tool introduces into decision making processes, or does this tool simply require the same skepticism as any other source? What decisions need to be taken, when, and what information is required to make those decisions?"

You’re asking all of the right questions. I’ll mull these questions over and send you my thoughts in the prototyping phase.
Jakob's profile photo

Jakob Rogstadius

April 22, 2013, 05:28AM
Thank you Chris for your kind words. I look forward to your reflections later on.

Nathan Maton

May 02, 2013, 18:38PM
Hey Chris, did you have more thoughts on Jakob's idea here? He definitely asks some good questions. I'd be keen to even just dig deeper on one of them, like what leads to more action rather than just more information. How could Jakob test this out in the Prototyping phase in a really lightweight and ethical way? Or how could we join him to plan a test that takes place later if our time in this phase doesn't allow it? It seems to me that the information needs to a) get to an organization that specializes in preventative action, or b) needs to get enough public attention that global authorities feel compelled to act. b sounds potentially dangerous although maybe a good route, but a sounds more promising. How can each region that uses CrisisTracker (CT) find out relavant information from it? I could see an awesome prototype being researching the organizations who take preventative action in a region CT works and then doing some interviews with them to find out what type of info they act on and if CT could be tweaked or already fits into that category, then reporting back here those results. It wouldn't take more than 2 days of work total, and would test that hypothesis well.
Login
Close
Login to OpenIDEO
 
or