2016 CCICADA RETREAT Abstracts – April 28-30, 2016
Abstracts
Abena Adusei and Ahlam Tannouri, Morgan State University
The Economic Impact of Malaria on Sub-Saharan African Counties
Malaria is a mosquito-borne disease that wrecks much damage in many Sub-Saharan African countries as well as South Asia and South East Asian countries. It causes substantive loss economically and also results in the reduction of quality of life in malaria endemic countries. Malaria morbidity reduces a country’s GDP growth and affects the economic growth of an entire region. This study reviews literature related to the economic impact of Malaria on African countries. The growingeffect of climate change on the spread of Malaria is also investigated. The impact of malaria on GDP is estimated by Cobb Douglas functions using several independent variables like the capital stock or investment, labor, and other factors which affect production. Malaria leads to a decrease in its long-term economic growth.
N. Orkun Baycik, Department of Industrial and Systems Engineering, Rensselaer Polytechnic Institute
Interdicting Layered Networks: Applications and an Effective Reformulation Technique
We study a resource allocation problem that involves interdicting the components of a physical flow and information flow networks. The objective of the defender is to maximize the physical flow whereas the objective of the attacker is to minimize this maximum amount. There exist interdependencies between the networks which leads to a network interdiction problem with a discrete inner problem. By using the hierarchical structure of the network, we apply a multi-step novel dual-based reformulation technique to solve an equivalent single-level problem. We apply this reformulation technique to the application of combating illegal drug trafficking and protecting cyber infrastructures, and present computational results.
Stephen Dennis, Innovation Director, Homeland Security Advanced Research Projects Agency (HSARPA), Science & Technology Directorate, U.S. Dept. of Homeland Security
EmPowering Homeland Security Data Analytics
Recent innovations in analytic computations can be used to power innovations in the homeland security mission space. The HSARPA Data Analytics Engine is helping homeland security organizations translate these innovations into capabilities for high priority science and technology investments (known as APEX programs) including: Next Generation First Responder, Border Situation Awareness, Real Time Bio-threat Awareness, and Screening at Speed. Data analytics is a key component of these programs and the Data Analytics Engine offers both subject matter expertise and a laboratory for characterizing data sets and computational analytics technologies. The HSARPA DAE also works directly with DHS components and stakeholders in order to understand business processes, operational data sets, and technologies that can significantly advance component missions in a variety of areas. In addition, the DAE collaborates with Universities, Government programs and Industry to shape solutions for homeland security application. The presentation will provide an overview of data analytics activities in the DAE, discuss recent results and highlight specific areas of interest for potential collaborations.
Kenneth C. Fletcher, Chief Risk Officer, Transportation Security Administration (TSA)
Evolution of Aviation Security
TSA’s approach to aviation security continues to evolve away from the one-size-fits-all security layers thinking that dominated the first decade following 9/11. In 2011, the agency began implementing an intelligence driven risk-based security approach for passenger screening. The RBS concept continues to evolve with two additional steps in development. The next evolution integrates passenger and checked baggage screening with airport security capabilities. Longer term, TSA intends to adopt a dynamic flight risk approach that integrates risk from across the aviation domain. The envisioned phases support improved allocation and use of limited security resources to better manage risk to commercial aviation. An overview of TSAs evolution in thinking about the aviation security problem, the complexity of this problem, additional capabilities required, and the application of risk management principles will be discussed.
About the Speaker:
Kenneth Fletcher was named the Chief Risk Officer in February 2014. In this new position he is responsible for developing and driving the long-range strategic vision and objectives for TSA with respect to risk-based security and risk management activities, and implementing enterprise risk management across all areas of the agency.
Since joining TSA in January 2003, he has held a variety of field and headquarters positions including Deputy Assistant Administrator for the Office of Risk-Based Security, Senior Advisor to the Administrator, Assistant Federal Security Director of Screening at Baltimore-Washington International Thurgood Marshall Airport, Deputy Federal Security Director at Chicago’s O’Hare Airport and Chief of Staff for the Office of Training and Quality Performance. Prior to joining TSA in January 2003, he worked for five years at Motorola as a senior production supervisor, quality manager and senior program manager for new product introduction.
Fletcher is a veteran, having served in the U.S. Navy for 23 years before retiring in 1997. He is married and has three children and five grandchildren.
Lila Ghemri, Texas Southern University and Shengli Yuan, University of Houston-Downtown
Teaching Mobile Environment Security Using Modules
Mobile devices are fast becoming the dominant computing platform for an increasing number of people, especially the younger generation. Indeed millions of people are using their mobile phones as the main way to access the internet and social and entertainment media. This surge in mobile devices usage has unfortunately been accompanied by an increase in malware specifically designed to infect mobile devices. From an educational standpoint, it is then becoming imperative to inform students about the risks and threats of mobile devices, not only as users but also as developers of mobile software. We present three educational modules which have been each designed to cover a specific aspect of the mobile environment: Infrastructure Security, Devices Security and Applications Security. These modules have been fully developed and tested with students. We believe that they offer students adequate coverage of the field of mobile security.
Georges Grinstein, University of Massachusetts – Lowell
Integrating Statistics, Algorithms, Visualization and Interaction into a Unified Theory
To understand and build complex human-machine symbiotic systems requires the integration of data analysis (statistics and algorithms), visualization, and interaction (human, machine). There are established theoretical foundations for statistics and algorithms (including machine-learned algorithms), and emerging theories for visualization and interaction. There is not yet a theoretical foundation for underpinning these components in a coherent manner. The Alan Turing Institute (ATI) has initiated the development of such a unified theoretical foundation.
Historically theoretical unification has always been a driving force in the advancement of the sciences. Alan Turing’s Universal Computer exemplifies such a great endeavour.
I will report on this first Symposium on Theoretical Foundation of Visual Analytics whose objective is to outline the scope of such a theoretical foundation; identify the known theoretical components, and assess their role in underpinning each of the above components; envision the development paths in the coming years through collective effort of different disciplines including for example but not limited to computer science (visual analytics, HCI, artificial intelligence, …), mathematics, and the cognitive, engineering and social sciences.
Key Scientific Questions to be Answered:
1. What would a theory of visual analytics be able to do? What phenomena may it explain, what measurements may it feature, what laws may it derive, what causal relationships may it model, and what outcomes may it predict?
*/Example:/*/ Why doesn’t entropy maximization usually result in the best visual design in visualization?/
2. What existing theories in mathematics, computer science, cognitive sciences, and other disciplines may contribute to the theoretical development in visual analytics? What are their strengths and weaknesses in relation to the four components of visual analytics (i.e., statistics, algorithms, visualization and interaction), and to the requirements in (a) (i.e., explanation, measurement, laws, causality, and prediction)?
*/Example:/*/ How do gestalt effects benefit visual analytics, and how can such benefits be quantitatively measured?/
3. What would be possible pathways that may lead to the establishment of such a theoretical foundation, and what would be the milestones for measuring successes in research and development.
*/Example/*/: What is the best way to utilise the capability of interactive visualization in breaking the conditions of data processing inequality, which is a major theoretical and practical stumbling block in data intelligence?/
Darby Hering, Dennis Egan, Paul Kantor, Christie Nelson, and Fred Roberts, Rutgers University
Options for Cyber Risk Management Information Sharing
Effective and timely sharing of cyber risk management information among all stakeholders in the Maritime Transportation System (MTS) is vital to maintaining a safe, secure and resilient MTS. To develop information sharing protocols across this complex system, we must consider the layers of cyber risk management, including communication and technology, economic, and legal and regulatory aspects.
Our research addresses the following questions: What is the most appropriate role for the U.S. Coast Guard (USCG), and how does guidance for physical security relate to cyber risk management needs? What organizational systems could best support the needed sharing? What kinds of incentives could be used to encourage participation, particularly from private industry? What information needs to be shared, and when? What technologies could be used to enable and safeguard the information sharing? In this talk, we discuss the approach taken by the CCICADA-Rutgers team to address these topical questions. We present our initial findings recommendations related to each question.
Eduard Hovy, CCICADA Research Director-Carnegie Mellon University
Gathering and Applying Information from Social Media in Service of Homeland Security Applications
Over the lifetime of CCICADA, we have been investigating various aspects of the information available on Social Media, including Twitter, Wikipedia, web advertisements, and news. Different types of information are of interest to, and support the tasks of, different government agencies working for homeland security. For emergency response management, the discovery and tracking of events as they unfold can be facilitated by appropriate extraction and organization of tweets, which serve as an on-the-ground dynamic (and free) information source. Our ongoing work focuses on two aspects: disaster event evolution and the search for correlations (and causation/prediction!) between events of interest and generally related news and tweets in social media. To combat human trafficking, the selection and filtering of online advertisements and bulletin board notices regarding various forms of human exploitation has led to a successful project that was transitioned to federal law enforcement in 2015. To develop methods to better understand the social dynamics of potentially dangerous individuals we have been analyzing contentious discussions in online groups like Wikipedia Articles for Deletion, trying to identify the participants’ social personas (such as Leaders, Rebels, etc.). In this talk I briefly describe our work over the past 6 years and highlight some interesting avenues for future research.
Paul Kantor, CCICADA Research Director- University of Wisconsin and Rutgers University
Qualantative Analysis: An Emerging Approach to Reasoning with Soft Data
A growing number of CCICADA projects involve very soft data, which are gathered by interview and document analysis. Examples are the work on venue security (BPATS, and BPATS-2), and projects on cybersecurity. The challenge has been to develop principled methods for eliciting information that can be combined in quantitative ways, and to develop clear methods for performing that combination. In several projects we have developed a pair of elicitation concepts that prove effective in communicating with expert practitioners. The subsequent quantitative analysis weights response rates, respondent experience, and the shape of the cumulative distribution of responses. The new designation, “Qualantative” is very nearly a Google hapax logomenon, as the only two instances appear to be typographical errors. (Joint work with Dennis Egan, CCICADA Assistant Director)
Mubbasir Kapadia, Rutgers University
Modeling, Validating and Optimizing Crowd Dynamics
This talk will explore the following critical areas of dynamic crowd modeling: a) what model faithfully replicates the dynamics of a real crowd? b) Can we automatically improve the behavior of a crowd simulation technique to optimize any criteria? c) Can we automatically adjust the environment layout to maximize crowd performance? d) How can we extract meaningful information from the dynamics of a crowd? e) How can we understand the influence of external stimuli on the aggregate dynamics of a crowd?
Pratik Koirala and Otis Tweneboah, Howard University
Challenges of Identifying Integer Sequences (Browsing the OEIS)
We studied integer sequences that have the tag more in the Online Encyclopedia of Integer Sequences. The purpose of our project was to find sequences (or variations of the sequences) from this database that lack closed-form formulas or have very few terms and study those sequences and incorporate programming concepts to find the missing terms.
(Mentors: Dr. Eugene Fiorini, Nathan Fox, Dr. Brian Nakamura)
Chengrui Li, Min-ge Xie, and Ying Hung, Rutgers University
A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data
The task of analyzing massive spatial data is extremely challenging. In this paper we propose a sequential split-conquer-combine (SSCC) approach for analysis of dependent big data and illustrate it using a Gaussian process model, along with a theoretical support. This SSCC approach can substantially reduce computing time and computer memory requirements. We also show that the SSCC approach is oracle in the sense that the result obtained using the approach is asymptotically equivalent to the one obtained from performing the analysis on the entire data in a super-super computer. The methodology is illustrated numerically using both simulation and a real data example of a computer experiment on modeling a data center thermal profile to determine the most efficient cooling mechanism as an ultimate goal.
Janne Lindqvist, Rutgers University
Smartphone Security: Why Doodling Trumps Text Passwords
We present free‐form gesture passwords, which provide significant advantages for smartphone authentication compared to existing and other proposed methods. They are particularly applicable for mobile use, and have shown great potential with respect to security, usability and memorability.
Free‐form gesture passwords allow people to draw any shape, drawings, words with any number of fingers as their passwords. Gesture passwords demand less effort and attention than typing. Also, gesture passwords are faster to perform, faster to create passwords and log in. Such characteristic are particularly suited for mobile use. In addition to the easiness to use,
free‐form gesture passwords also are easy to remember and secure.
This presentation will cover material published in several tier‐1 venues including MobiSys’14, IEEE Pervasive Computing and CHI’16. For more information, please see: http://securegestures.org/.
Christie Nelson, Rutgers University
Field Research on Walk-Through Metal Detectors at Stadiums
CCICADA has been involved in researching WTMDs since almost since the beginning of their introduction to major league sports venues. This research has many diverse aspects. Although WTMDs are not new, and have been used successfully at airports, prisons, schools, etc., they are relatively new to large sports venues. These venues have a different sort of challenge for these machines than their previous use cases: a large throughput of people in a very short (typically 30 minute window) while typically being subjected to sometimes harsh outdoor weather and environment. This research involved formal experiments to understand the functionality of these machines in this setting. It also involved less formal observations on their use in the field, best practices on the use of these machines, guidance given from the manufacturers through literature such as manuals and verbally, understanding what the government standards about the machines really meant in this setting and which ones were actually relevant, evaluating what security settings on the WTMDs were again relevant to this environment, and collecting data from actual events on the patron screening process. We then used this data in part to validate a simulation tool showing how queue lengths and patron wait times may change based on various scenarios.
CCICADA’s field observations took place at approximately ten different types of venues which hold very diverse types of events and were in different locations. During our first visits, typically we would take a security tour of the facility and talk with the security director to understand the venue itself. Afterwards, usually we would return to watch the crowds on an event day, observing the WTMDs in use, along with how the security staff used them and watching their best practices in use. We also held several interviews with security experts, round table discussions with groups of experts, and a workshop with approximately 40 people to gain insight about their use of the WTMDs as well as other related stadium security topics. It is important to note that each venue has sometimes drastically different layouts, and so best practices sometimes vary greatly from venue to venue. In addition, we also attended a few training sessions that venues would hold for their staff, in order for us to understand what the venues expected the security personnel to actually do with the WTMDs.
While performing field observations of the WTMDs on an event day, we collected data. This data included the security setting of the WTMD, the number of patrons screened per minute (including secondary screening), the rate that the WTMDs would alarm, the object that caused the alarm, and if the screener saw the alarm (and rescreened the patron) or if the screener “missed” the alarm due to crowds, etc. As far as we know, this is the only data that has been collected in this sort of environment. We found some estimates of throughput in our literature search, but these estimates did not include secondary screening. So for a stadium environment, these estimates sometimes varied greatly from our observed data. We then used the data collected to validate a simulation tool showing how queue lengths and patron wait times may change based on various scenarios.
We also researched the WTMD literature and information provided to the venues. A comparison was performed across various WTMD manufacturer’s manuals to determine the information provided. In addition, we spoke with various manufacturer personnel to gather additional insight on the WTMDs. We even helped a venue security director to understand the security settings on the WTMD (causing him to change the setting for the venue’s events).
Dan Roth, CCICADA Research Director- Computer Science and the Beckman Institute University of Illinois at Urbana-Champaign
Making Sense of (and TRUSTING) Unstructured Data
Studies have shown that over 85% of the information people and organizations deal with is unstructured, and that the vast majority of it is in text. A multitude of techniques has to be used in order to enable intelligent access to this information and support transforming it to forms that allow sensible use of the information. The fundamental issue that all these techniques have to address is that of semantics – there is a need to move toward understanding the text at an appropriate level of abstraction, beyond the word level, in order to support access, knowledge extraction and synthesis. I will discuss briefly that key question of how to transform Data to Meaning and thus better facilitate access to information and the extraction of knowledge from unstructured text, but will focus on a second dimension of this problem, the Trustworthiness of Information – while we can locate and extract information quite reliably, we lack ready means to determine whether we should actually believe them. I will describe some of our research in these directions and point to some of the key challenges, including those of learning models with indirect supervision, knowledge acquisition, and reasoning.
Thomas Sharkey, Rensselaer Polytechnic Institute
On the Impact of Information in Infrastructure Restoration
We consider the role of information in infrastructure restoration in two contexts and how optimization approaches can help address these contexts. The first context focuses on measuring how information-sharing can reduce the loss in effectiveness from decentralized decision-making across infrastructure networks after a large-scale disruptive event, such as Hurricane Sandy. The second context focuses on computing physical damage in an infrastructure network from outage reports, which may be necessary should the human-machine interface of the network be compromised from a cyber-attack.
Jieli Shen, Regina Liu and Min-ge Xie, Rutgers University
Individualized Fusion (iFusion) Learning with Statistical Guarantee: A Confidence Distribution Approach
Big data is often featured with heterogeneity especially when it is created by aggregation from multiple individual databases or sources. Naturally, one wonders how applicable an overall learning result is to a particular individual.Ignoring the heterogeneity and proceeding with the entire data could lead to biased and overprecise conclusion for one particular individual; on the other hand, exclusion all the other individuals reduces bias, however, at the cost of loss of efficiency. As such, it is crucial to retro t statistical learning methods so as to adapt to individual needs. Important applications include, for example, personalized medicine, target advertising, and econometrics forecasting.
In this article, we develop a general and flexible framework called “Individualized Fusion (iFusion) Learning” to learn efficiently on (any) individual of interest when multiple individual data sources are available. It begins with summarizing data and model information into con dence density functions independently from each individuals, then forming a clique corresponding to a target individual, and adaptively combing the confidence density functions into an enhanced one. Drawing inference from this combined confidence density for the target achieves variance education by “borrowing strength” from individuals inside the clique while controls bias and retains statistical validity by downweighing or filtering individuals outside the clique. Additional benefits include: (i) it refrains from artificial distribution assumptions on the the underlying individual-specific parameters (e.g. normal distribution like in a random-effects model); (ii) it is fitted into the “divide-conquer-combine” framework and scalable to big data; (iii) it utilizes the concept of confidence distributions and inherits many nice properties from it – informative, general, flexible, and can be readily implemented when only individual summary statistics are available. Supporting theories are provided and illustrated through simulations in various settings and two real datasets in forecasting portfolio returns and detecting landmines.
Sachin Shetty, Tennessee State University
Moving Target Defense for Distributed Systems
Distributed Systems are complex systems, and cyber attacks targeting these systems have devastating consequences. Several cybersecurity solutions have failed to protect distributed systems primarily due to asymmetric warfare with cyber adversaries. Most cybersecurity solutions have to grapple with the tradeoff between detecting one breach vs. blocking all possible breaches. Current cyber threats are sophisticated and comprised of multiple attack vectors caused by organized attackers. Most of the current cyber defenses are blackbox or set-and-forget approaches which can protect against zero-day attacks and are ineffective against dynamic threats. The asymmetric conundrum is to determine which assets (software, embedded devices, routers, back-end infrastructure, dependencies between software components) need to be protected. Recently, Moving Target Defense (MTD) has been proposed as a strategy to protect distributed systems. MTD-based approaches take a leaf out of the adversaries book by not focusing on fortifying every asset and make the systems move to the defender’s advantage. MTD is a game-changing capability to protect distributed systems by enabling defenders to change system/network behaviors, policies, or configurations automatically such that potential attack surfaces are moved in an unpredictable manner.
In this talk, I will present MTD techniques to determine placement of virtual machines in cloud data centers. The techniques focus on secure risk assessment of virtual machines and physical machines in cloud data centers and placement of virtual machines while taking into security risk as a criteria and evaluating cost of MTD. This talk will be organized as follows: I will provide an overview of MTD and the need for research on developing novel MTD schemes at several levels: program (instruction set), host (IP address, memory), cloud computing platform, network, and mobile systems. I will present an approach to perform secure-aware Virtual Machine (VM) migration in cloud data centers. Next, I will present an approach to develop MTD-based network diversity models to evaluate the robustness of cloud data centers against potential zero-day attacks. I will present a network-aware VM placement scheme in cloud data centers. Finally, I will present a cost model to evaluate the cost of MTD in cloud data centers.
Bio:
Sachin Shetty is an Associate Professor in the ECE Department at Tennessee State University. He received his Ph.D. in Modeling and Simulation from Old Dominion University in 2007. His research interests lie at the intersection of computer networking, network security and machine learning. Recently, he has been working on security issues in cloud computing, cognitive radio networks. FOSR, AFRL, DHS, DOE, and NSF have provided funding for several cyber security research and education projects. He has authored and coauthored over 80 technical refereed and non-refereed papers in various conferences, international journal articles, and book chapters. He received the 2010 and 2011 DHS Scientific Leadership Awards and Research Mentorship Awards at Tennessee State University from 2012- 2015.
Ahlam Tannouri and Sam Tannouri, Morgan State University
Biometrics Identification Images: PCA and ICA Techniques for Feature Extraction
At a time when Identity theft is becoming rampant, a social security and a date of birth should be accompanied by the individual unique biometrics.
The Technologies of pattern recognition, composition, classifications, performance and evaluations have their challenges.
We will present a small study comparing PCA and ICA over different biometric attributes.
Trefor Williams and Nicole Todd, Rutgers University
Visualizing Railroad Grade Crossing Accident Factors using WEAVE
Several interactive Weave data visualizations that highlight the factors involved in certain types of grade crossing accident situations have been constructed. The source of the data is the Federal Railroad Administration grade crossing accident reports. This database has many fields and the use of advanced data visualization allows for the examination of complex interacting accident factors. Weave’s ability to construct large displays with interactive visualizations is used to create the visualizations. The visualizations constructed highlight the types of grade crossings at which large truck accidents occur, weather and time affects on accidents, and the high rate of accident occurrence at grade crossings that are interconnected with traffic signals.
We will discuss the visualization results, and relate them to newsworthy events including recent railroad accidents, and the FRA administrator’s recent call to inspect grade crossings with traffic signal interconnections.
Shuxin Yao and Eduard Hovy, Carnegie Mellon University
Mining Event Information from Social Media for Emergency Response Management
Twitter, a representative form of social media, has been recognized as a potentially rich information source of situational awareness. In our work, we aim to use Twitter as information source to help government people deal with emergency situations related to disasters.
Our goal is to generate scripts of typical disasters. A script of the disaster is the stereotypical sequence of events and associated behaviors during it. Usually, the same type of disasters has similar scripts. When a new natural disaster happens, we can use our knowledge about the same type of disaster that happened before, combined with the tweets about the new disaster, to predict what will happen next and its significance. The prediction can help government people make decisions.
In this project, we work on the twitter corpus about Hurricane Sandy that happened in 2012. We have obtained 5721649 tweets from the affected areas at the relevant days. Later, we will proceed to other hurricanes and other types of disasters, such as earthquakes, floods, etc.
We represent event scripts as vectors that encode patterns and relationships of words, topics, and sentiments, organized in time. We learn these vectors from disaster‐related tweets for their main topic(s), time, and relevant entity(ies). We sort tweets in time order and divide them into different groups based on named entities. Then we add labels to tweets and analyze them based on the labels. The labels include: 1) lexical labels after fixing word spelling and abbreviation.; 2) part of Speech labels for twitter, proposed by Kevin Gimpel et al.; 3) sentiment labels that reflect the state of the author; 4) topic labels (a topic is some common aspect of a hurricane event, and identifying topics enables us to observe the event from different point of view). These labels and entities together help us deal with a big problem about using Twitter as information source — its confusion.
Our work includes six parts: named entity extraction and timestamping, proposing a topic typology for hurricanes, tweet annotation, classifier building, script vector generation, and evaluation. We have completed the first three parts, and are building classifiers. We extracted 294 important entities and developed a topic typology consisting of 24 topics. We used Mechanical Turk as the annotation platform, and obtained an average pairwise annotator agreement of 0.935 and average Krippendorff alpha of 0.503. We will continue work and report progress.
Zhigang Zhu, The City College-CUNY, Jie Gong, Rutgers University, Hao Tang, Borough of Manhattan Community College-CUNY, Cecilia Feeley, Rutgers University, Greg Olmschenk, The Graduate Center-CUNY, and Fred Roberts, Rutgers University
GCTC 2016: Towards a Smart Transportation Hub Requiring Minimal New Infrastructure for Services to Persons with Special Needs
There are a number of enormous transportation hubs in NYC, such as New York Penn Station, Port Authority Bus Terminal and Grand Central Train Station. Using the current emerging mobile computing technologies with computer vision techniques in 3D localization and crowd analysis will provide a great opportunity to significantly improve services as well as creating innovative approaches to accommodate passengers and customers by automatically assisting them with the ability to navigate the complicated plaza. This is especially true for people in great need, such as individuals with visual impairment and Autism Spectrum Disorder (ASD), people with challenges in finding places, particularly persons unfamiliar with a metropolitan areas like NYC. In the US alone, the visually impaired population is 6.6 million people and expected to double by 2030 (from 2010 figures). According to CDC, ASD is the fastest-growing developmental disorder affecting 1 in every 68 people in the US. One common and recurring obstacle that people from both groups face everyday is navigation, particularly as related to mobility, and using public transportation services is the best way for them to travel; however there are also significant hurdles due to their challenges.
This joint Global City Teams Challenge (GCTC) 2016 Action Cluster aims to systematically investigate a novel cyber-physical infrastructure framework that can effectively and efficiently transform existing transportation hubs into smart facilities capable of providing better location-aware services (e.g. finding terminals, improving travel experience, obtaining security alerts) to the traveling public, especially for the under served populations including those with visual impairment, Autism Spectrum Disorder (ASD), or simply those with navigation challenges. The team is uniquely positioned to collaborate with NJ Transit, with its stations in both NJ and NYC, to create and test Smart Transportation Hub (Smart T-Hub) with minimal changes of the current cyber-physical infrastructure of their stations. In order to fully study the needs of visually impaired users, we will start our pilot test at Lighthouse Guild in New York City, in collaborating with NYS Commission for the Blind (NYSCB).
As results of two NSF grants, the team have developed algorithms and mobile devices on assistive navigation, and under the support of a number of DHS CCICADA projects, the team has been developing technology in acquiring accurate semantic 3D models of facilities and developed vision algorithms for crowd analysis with the surveillance cameras. The GCTC solution builds up on these three solid foundations and tackle three fundamental research challenges in fulfilling the aforementioned vision: (1) Relative localization of mobile devices with Bluetooth connections, which generate a locally connected graph (mobile graph) with only the information of their IDs and relative distances known to each other. We anticipate we would have to develop algorithms with distributed computing to infer each other’s relative locations. (2) Registration of the mobile (dynamic) graph with the 3D model at various levels. We propose to use machine-learning techniques to pre-compute a database of location-awareness 3D and 2D features. The registration of the mobile graph with the 3D model can further improve robustness and accuracy of 3D localization of the users. (3) Customized path planning in a crowded and complex environment. We propose to use the semantic 3D model, the crowd analysis results and the bus schedule information to provide the most preferred path for users to their destinations with considerations of their personal preferences or needs, dynamic crowd patterns, and adaptive sensing controls.
The Smart T-Hub will be an exemplar case to extend the reach of the mobility of BVI and ASD individuals and give them more freedom in exploring the riches of their lives. The Smart T-Hub GCTC action cluster will offer new project ideas and platforms for both our undergraduate seniors and graduate students at both CUNY and Rutgers for their senior design projects and graduate research.
POSTERS
Vijay Chaudhary, Howard University
Field Research on Walk-Through Metal Detectors at Stadiums
CCICADA has been involved in researching WTMDs since almost since the beginning of their introduction to major league sports venues. This research has many diverse aspects. Although WTMDs are not new, and have been used successfully at airports, prisons, schools, etc., they are relatively new to large sports venues. These venues have a different sort of challenge for these machines than their previous use cases: a large throughput of people in a very short (typically 30 minute window) while typically being subjected to sometimes harsh outdoor weather and environment. This research involved formal experiments to understand the functionality of these machines in this setting. It also involved less formal observations on their use in the field, best practices on the use of these machines, guidance given from the manufacturers through literature such as manuals and verbally, understanding what the government standards about the machines really meant in this setting and which ones were actually relevant, evaluating what security settings on the WTMDs were again relevant to this environment, and collecting data from actual events on the patron screening process. We then used this data in part to validate a simulation tool showing how queue lengths and patron wait times may change based on various scenarios.
CCICADA’s field observations took place at approximately ten different types of venues which hold very diverse types of events and were in different locations. During our first visits, typically we would take a security tour of the facility and talk with the security director to understand the venue itself. Afterwards, usually we would return to watch the crowds on an event day, observing the WTMDs in use, along with how the security staff used them and watching their best practices in use. We also held several interviews with security experts, round table discussions with groups of experts, and a workshop with approximately 40 people to gain insight about their use of the WTMDs as well as other related stadium security topics. It is important to note that each venue has sometimes drastically different layouts, and so best practices sometimes vary greatly from venue to venue. In addition, we also attended a few training sessions that venues would hold for their staff, in order for us to understand what the venues expected the security personnel to actually do with the WTMDs.
While performing field observations of the WTMDs on an event day, we collected data. This data included the security setting of the WTMD, the number of patrons screened per minute (including secondary screening), the rate that the WTMDs would alarm, the object that caused the alarm, and if the screener saw the alarm (and re-screened the patron) or if the screener “missed” the alarm due to crowds, etc. As far as we know, this is the only data that has been collected in this sort of environment. We found some estimates of throughput in our literature search, but these estimates did not include secondary screening. So for a stadium environment, these estimates sometimes varied greatly from our observed data. We then used the data collected to validate a simulation tool showing how queue lengths and patron wait times may change based on various scenarios.
We also researched the WTMD literature and information provided to the venues. A comparison was performed across various WTMD manufacturer’s manuals to determine the information provided. In addition, we spoke with various manufacturer personnel to gather additional insight on the WTMDs. We even helped a venue security director to understand the security settings on the WTMD (causing him to change the setting for the venue’s events).
Chintan A. Dalal, Rutgers University, Doug Nychka, National Center for Atmospheric Research, and Claudia Tebaldi, Climate Central, NJ
Statistical Structural Analysis of Climate Model Output
Various independent teams worldwide are developing climate models that give us spatio-temporal values of climate state variables. These climate models capture various interacting nonlinear processes through observational data and a physics‐based understanding of the earth’s system, and thereby make probabilistic projections for the climate change at various timescales (such as multi‐seasonal, multi‐year or multi‐decadal) and for various scenarios. These scenarios in the projections of earth’s climate can provide policy makers with valuable information that can be used for our society’s sustainability.
In this paper, I establish a structure of the model output’s space by using tools from the discipline of information geometry. Furthermore, I exploit the developed structure to statistically sample new future climate scenarios from the space of model outputs and to define a metric in order to compare model’s performance with the observational datasets . I validate the statistical correctness of the method by applying the diagnostic tools of principal component analysis, Kullback‐Leibler divergence, and semi‐variogram plots.
Analyzing the model out put, using the framework developed in this work, could aid in generating samples that is computationally inexpensive and has been unrealized by the perturbed physics ensemble method. Additionally, the metric developed to compare model outputs can be used to systematically weight the multi‐model ensemble, which is used in Coupled Model Inter‐comparison Project (CMIP), in order to provide a consensus on the future climate state.
Xianyi Gao, Bernhard Firner, Shridatt Sugrim, Victor Kaiser-Pendergrast, Yulong Yang, and Janne Lindqvist, Rutgers University
Elastic Pathing 2.0: Your Speed is Enough to Track You Revisited
In the past, automotive insurance companies did not have a large amount of information about their customers. All customers would pay similar prices, despite potentially large variations in
their driving habits. Recently, advances in technology have provided ways to measure how people drive. Customers have the opportunity to opt-in to usage-based automotive insurances for reduced premiums by allowing companies to monitor their driving behavior. Several companies claim to measure only speed data to preserve privacy.
Doubting about their claim, our elastic pathing project is to show that drivers can be tracked by merely collecting their speed data and knowing their home location, which insurance companies do, with an accuracy that constitutes privacy intrusion. The problem is challenging because it is not obvious that the exact driving path can be reproduced by the speed data and the starting location. There may be more than one path matching well with the speed data. And by just a few minutes of driving, there are a lot of possible paths that a driver can take. We have demonstrated our original elastic pathing algorithm through previous publication in Ubicomp 2014 [1]. Our previous work has tested the algorithm’s real-world applicability-we evaluated its performance with datasets from central New Jersey and Seattle, Washington, representing suburban and urban areas. Our algorithm estimated destinations with error within 250 meters for 14% traces and within 500 meters for 24% traces in the New Jersey dataset (254 traces). For the Seattle dataset (691 traces), we similarly estimated destinations with error within 250 and 500 meters for 13% and 26% of the traces respectively.
We have recently further improved our algorithm and introduced the elastic pathing 2.0. In this new version so far, we have optimized the turn determination and its restriction on speed. We also further investigated the effect of applying speed limitation (e.g. provided by OpenStreetMap) and the shortest path method on our algorithm’s performance. Despite the high challenge of the problem, we managed to increase the accuracy by roughly 20%. With elastic pathing 2.0, we estimated destinations with error within 250 meters for 17% traces and within 500 meters for 24% traces in the New Jersey dataset (254 traces). For the Seattle dataset (691 traces), we managed to infer destinations with error within 250 and 500 meters for 15.5% and 27.5% of the traces respectively.
[1] Xianyi Gao, Bernhard Firner, Shridatt Sugrim, Victor Kaiser-Pendergrast, Yulong Yang, and Janne Lindqvist. 2014. Elastic pathing: your speed is enough to track you. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp ’14). ACM, New York, NY, USA, 975-986.
Christie Nelson, Rutgers University
A Case Study on Data Cleaning and Data Quality: The US Coast Guard’s Search and Rescue Data
The US Coast Guard performs thousands of search and rescue missions every year. These missions may be rescuing someone whose boat is sinking or as minor as helping someone whose boat ran out of gas. This data is collected continually from stations all over the United States. We’ve looked at a portion of this data to understand what errors exist in the data, and to help the US Coast Guard understand the overall quality of the data. We’ve developed a methodology to identify errors in the data in a start to clean the data.
William M. Pottenger, CCICADA Director of Technology Transfer and Edgardo Molina, CUNY Graduate Center
2012 Hurricane Sandy Scenario to Illustrate Use of Social Media for Situational Awareness and Emergency Response Management by US Coast Guard
Event: In October 2012, Hurricane Sandy became the largest hurricane on record to hit the Atlantic as measured by diameter. In the US alone, Hurricane Sandy affected 24 states from Florida to Maine, as well as inland to Michigan and Wisconsin. Particularly severe damage was caused in New Jersey and New York, and estimated damage to the US was $65 billion with 157 deaths. From October 27 – November 1, 19,729 flights were cancelled; the NY Stock Exchange and Nasdaq closed for October 29-30; by October 31 over 6 million people were still without power; 30,000 New Jersey and New York residents still remained displaced December 6, 2012. This scenario will focus on the days leading up to Sandy (preparation phase) and conclude in the days immediately after the hurricane (early response phase), dates ranging from October 22, 2012 through November 2, 2012.
Data: The data for the scenario was obtained from the Twitter firehose. The only requirement on the data other than the date range was that the Tweets be geotagged, and that they were sent from one of the following Sandy affected US states: Connecticut, Delaware, Massachusetts, Maryland, New Jersey, New York, North Carolina, Ohio, Pennsylvania, Rhode Island, South Carolina, Virginia, West Virginia. The filter was based on a set of bounding boxes that covered this Sandy-affected region, and also covered small parts of adjacent states. There are 6,556,328 Tweets from 265,043 unique users, with the most Tweets being from New York (861,593). Pennsylvania (679,847), New Jersey (562,841), Virginia (503,841), and North Carolina (483,899).
Tools: RPI Event Timeline: The RPI event timeline tracker creating event annotations illustrating event triggers, dates/times, and locations. This tracker will be used throughout the entire event to show phases of the disaster (preparation, event, and recovery).
CMU Emotional Response Evolution: Using RPI’s event timeline, the CMU emotional response evolution will take the timeline and triggers and use sentiment analysis to find causal links between the event triggers. CMU’s evolution tool will also be used throughout the event to show the various phases of the disaster.
Rutgers SemRel: Rutgers’ SemRel tool takes the event, trigger, and emotion attributes and links the emotion attributes with the emotion holders and emotion targets, showing what the public’s emotion is by location and event. SemRel shows relation topics (linking emotions to people, emotions to locations, etc.). Relation topics will be combined, becoming input into metatopics (linking emotions across several locations, e.g. instead of NJ, would show the whole NE region). SemRel will give additional information, e.g. about categories of people (“Agent” seen below), where Agent could be children, parents, elderly, etc. The relation topics & metatopics will help to distinguish between groups of agent types to different emotions.
CMU Dataless Classification: Using UIUC’s dataless classification combined with Weka’s clustering algorithm (to group Twitter messages by location/cluster), we are able to get an idea of what people are requesting at a given time and in a given area for preparation, for help during the event, and later for aid during recovery.
Approach: Each day of the hurricane timeline, Oct 22 – Nov 2, 2012, the following will be performed:
1. Input Twitter Data (see “Data” above) into RPI event timeline tracker, including Twitter message text, timestamp, and geolocation.
2. RPI event timeline tracker will output data of the form (Event Type; Event Trigger; Agent; Origin; Time; Emotion Attribute).
3. The RPI output (Event Type; Event Trigger; Agent; Origin; Time; Emotion Attribute) will be used as input to the CMU emotional response evolution. The CMU tool will show causal links and a timeline between the event triggers.
4. The RPI output (Event Type; Event Trigger; Agent; Origin; Time; Emotion Attribute) will be used as input to the Rutgers SemRel tool, which will show relation topics.
5. CMU takes the raw Twitter data (see “Data” above), and determines requests by geolocation.
6. The CMU output is clustered by geolocation. Clusters of size x or more messages are used.
7. Clusters found on the CMU output with the top y requests are visualized.
Bashan Prah, Richard Damoah and Asamoah Nkwanta, Morgan State University
Analysis of Locally Transmitted Malaria Incidences and Climate Conditions in the United States, 1970-2004
Malaria affects over 100 million people worldwide each year, with an annual cost in human life exceeding one million deaths. In the United States about 1,500 cases of malaria are diagnosed each year. However, almost all the cases are imported rather than locally transmitted. Since 1950, 21 outbreaks of locally transmitted cases have been reported in the United States. Most of the cases occurred in California during the summer months. Our analyses seek to use available monthly reports of these malaria outbreak cases and climate related conditions to establish the factors favorable for an outbreak to occur, and to ultimately develop a malaria incidence forecasting system which would be based on reported cases and climate conditions.
Greg Olmschenk, David Zeng, and Hao Tang, City University of New York
Verification of Crowd Behavior Simulation by Video Analysis
Crowd simulation is an excellent approach for modeling the behavior of large groups of people in transportation hubs and other critical public facilities. It can improve the management of pedestrian traffic, especially prior to renovation or construction of a facility, and to provide effective emergency evacuation plans. Nevertheless, without validation with real video data, there is still a gap between the simulated model and real world data. In this project, we propose computer vision algorithms to automatically measure the crowd behavior and distribution of people in the transportation hubs so as to improve the fidelity of crowd simulation models. The density and trajectory of crowd can be extracted from real video sequence captured by surveillance camera at different times, including peak hours and non-peak hours, from different source locations to destinations in the facilities. Furthermore, they could be compared with crowd behavior generated by the simulation models, so an enhance and refined simulation model can be developed that is more suitable for real-world use and had predictive capabilities.
In this presentation, we will display the research findings discovered and methods developed with CCICADA in the summer 2015, supported by DHS summer research team program for MSI. We have developed a crowd tracking method for security videos and additionally have trained a deep convolutional neural network to estimate crowd density and tested the algorithm on some public release crowd image sequences. The results show the deep learning algorithm is effective on crowd density measurement.
Yulong Yang, Rutgers University
Free-form Gesture Authentication in the Wild
Free-form gesture passwords have been proposed as an alternative mobile authentication method to tackle issues with existing ones. Text passwords are not very suitable for mobile interaction, and methods such as PINs and grid-based patterns sacrifice security over usability. However, little is known about how free-form gestures perform in the wild. We present the first field study (N=91) of mobile authentication using free-form gestures, with text passwords as a baseline. Our study leveraged Experience Sampling Methodology to obtain high ecological validity while maintaining control of the experiment. We found that, with gesture passwords, participants generated new passwords and authenticated faster with comparable memorability while being more willing to retry after failed logins. Our analysis of the first gesture password dataset from the field showed biases in the user-chosen distribution tending towards common shapes. Our findings provide useful insights towards understanding mobile authentication. This work has been formally published in CHI’16, May 7-12, San Jose, CA. For more information, please go to http://securegestures.org.
Hira Narang, Fan Wu and Abisoye Ogunniyan, Tuskegee University
Numerical Solutions of Heat and Mass Transfer with the First Kind Boundary and Initial Conditions in Hollow Capillary Porous Cylinder Using Programmable Graphics Hardware
Heat and mass transfer simulation plays an important role in various engineering and industrial applications. To analyze physical behaviors of thermal and moisture movement phenomena, we can simulate it in terms of a set of coupled partial differential equations. However, to obtain numerical solutions to heat and mass transfer equations is very time consuming process, especially if the domain under consideration is discretized ion into a fine grid.
In this research work, therefore, one of acceleration techniques developed in the graphics community that exploits a general purpose graphics processing unit (GPGPU) is applied to the numerical solutions of heat and mass transfer equations. Implementation of the simulation on GPGPU makes GPGPU computing power available for the most time-consuming part of the simulation and calculation. The nVidia Compute Unified Device Architecture (CUDA) programming model provides a straightforward means of describing inherently parallel computations. This research work improves the computational performance of solving heat and mass transfer equations numerically running on GPGPU. We implement numerical solutions utilizing highly parallel computations capability of GPGPU on nVidia CUDA. We simulate heat and mass transfer with first kind boundary and initial conditions on hollow cylindrical geometry using the novel CUDA platform on nVidia Quadro FX 4800 and compared its performance with an optimized CPU implementation on a high-end Intel Xeon CPU. It is expected that GPGPU can perform heat and mass transfer simulation accurately and significantly accelerates the numerical calculation. Therefore, the GPGPU implementation is a promising approach to acceleration of the heat and mass transfer simulation. Our future plan includes extending our work to the second and third kind boundary and initial conditions, as well as to other geometries.
Chenyang Zhang, The City College of New York
Subject Adaptive Affection Recognition via Sparse Reconstruction
Multimedia affection recognition from facial expressions and body gestures in RGB-D video sequences is a new research area. However, the large variance among different subjects, especially in facial expression, has made the problem more difficult. To address this issue, we propose a novel multimedia subject adaptive affection recognition framework via a 2-layer sparse representation. There are two main contributions in our framework. In the subjective adaption stage, an iterative subject selection algorithm is proposed to select most subject-related instances instead of using the whole training set. In the inference stage, a joint decision is made with confident reconstruction prior to composite information from facial expressions and body gestures. We also collect a new RGB-D dataset for affection recognition with large subjective variance. Experimental results demonstrate that the proposed affection recognition framework can increase the discriminative power especially for facial expressions. Joint recognition strategy is also demonstrated that it can utilize complementary information in both models so that to reach better recognition rate.
Jiawei Zhang, VACCINE-Purdue University
Classification and Visualization of Crime-Related Tweets
Millions of Twitter posts per day can provide an insight to law enforcement officials for improved situational awareness. In this paper, we propose a natural-language-processing (NLP) pipeline towards classification and visualization of crime-related tweets. The work is divided into two parts. First, we collect crime-related tweets by classification. Unlike written text, social media like Twitter includes substantial non-standard tokens or semantics. So we focus on exploring the underlying semantic features of crime-related tweets, including parts-of-speech properties and intention verbs. Then we use these features to train a classification model via Support Vector Machine. The second part is to utilize visual analytics approaches on collected tweets to analyze and explore crime incidents. We integrate the NLP pipeline with Social Media Analytics Reporting Toolkit (SMART) to improve the accuracy of crime-related tweets identification in SMART.
This paper can also be utilized to improve crime prediction for law enforcement personnel.
Copyright © CCICADA. Website designed and maintained by The Lubetkin Media Companies LLC. All rights reserved..