Amit Sheth

Amit Sheth

LexisNexis Ohio Eminent Scholar & Exec. Dir., Kno.e.sis

Location
Dayton, Ohio Area
Industry
Higher Education

As a LinkedIn member, you'll join 300 million other professionals who are sharing connections, ideas, and opportunities.

  • See who you and Amit Sheth know in common
  • Get introduced to Amit Sheth
  • Contact Amit Sheth directly

View Amit's full profile

Amit Sheth's Overview

Current
Past
  • Advisory Committee Member at W3C
  • Professor at University of Georgia
  • Founder/COB/CEO/CTO/Chief Scientist at Taalee, Voquettee and Semagix
  • Founder, President, CEO at Infocosm, Inc
  • Member of Technical Staff at Telcordia Technologies
  • Staff Scientist at Unisys
  • Student (EEE) at BITS, Pilani
Education
  • A.G. High School
Connections

500+ connections

Websites

Amit Sheth's Summary

Educator, Researcher and Entrepreneur.

Prof. Sheth is working towards a vision of Computing for Human Experience incorporating semantics-empowered Physical-Cyber-Social computing and Smart Data. His is recent work has focused on semantics empowered Web 3.0 involving enterprise, social, sensor/IoT data and applications, as well as services and cloud interoperability. In the past he extensively worked on federated databases, semantic interoperability and workflow management. He extensive collaborations with clinicians and biomedical researchers encompasses biomedical knowledge discovery; and novel use social media and sensor data for patient-centered care and patient empowerment. Sheth’s most prized achievement is the exceptional success of his past advisees.

Earlier, Dr. Sheth was a professor at the Univ of Georgia, and he served in R&D groups at Bellcore, Unisys, and Honeywell. His h-index of 86 and i100 > 70 places him among the top authors in Computer Science (<100), WWW (<5) and databases (<25) [based on h-index]. His research has led to several commercial products, many deployed applications, two past successful companies, and one recent startup. He is on several journal editorial boards, is the EIC of the Intl Journal on Semantic Web and Info Systems (IJSWIS), joint EIC of Distributed & Parallel Databases (DAPD), and a co-editor of two Springer book series (Semantic Web and Beyond: computing for Human Experience, and Advanced Database Systems). http://knoesis.org/amit

Citation info: http://bit.ly/a-cite ; http://j.mp/MAS-a

More on: Education leading to exceptional careers for advisee (http://knoesis.org/amit/students), Research leadership (http://j.mp/www-Mar13), Vision (http://knoesis.org/vision), Health Care & Life Sciences (http://knoesis.org/amit/hcls), Tech Transfer & Entrepreneurship (http://knoesis.org/amit/commercialization), Open Source Tools/Data/Ontologies (http://knoesis.org/opensource). CV: http://j.mp/Sheth0713

Amit Sheth's Experience

LexisNexis Ohio Eminent Scholar

Wright State University

Educational Institution; 1001-5000 employees; Higher Education industry

January 2007Present (7 years 9 months) Dayton, Ohio Area

Prof. Sheth holds the endowed faculty position created by the Ohio Board of Regents, and funded equally by LexisNexis and the Ohio Board of Regents. He provides research leadership in next generation Web technologies and applications though Kno.e.sis (http://knoesis.org) which he founded when he moved to Wright State University. He is a professor in the Computer Science & Engineering and Biomedical Sciences (BMS) PhD program.

Educational Institution; 51-200 employees; Computer Software industry

January 2007Present (7 years 9 months)

Ohio Centers of Excellence are nationally recognized academic programs that generate world-class research and help draw talent and investment to the state. Kno.e.sis performes research in semantics, social, services, sensor, mobile and cloud computing and other Web 3.0 topics, with the purpose of accelerating a move from the information age to the meaning age. With 15 faculty members from Computer Science & Engineer, Biomedical Sciences, Health Care, and Cognitive Sciences, it also engages in extensive multidisciplinary and multi=institutional research (http://knoesis.org/projects/multidisciplinary), and achieves regional economic impact through extensive industry interactions, world-class workforce development and technology commercialization. It has over 100 researchers, including ~50 PhD students. Kno.e.sis ranks among the top organization in WWW (sharing 2nd position among universities based on 5-year h-index; Microsoft Academic Search - Mar 2013: http://j.mp/www-Mar13). Kno.e.sis was founded by Prof. Sheth in January 2007, and was declared an Ohio Center of Excellence in BioHealth in January 2010. He served as its founding director until July 2013, when he was appointed as the Executive Director, with further expansion of his duties and oversight.

Fellow

IEEE Computer Society

1985Present (29 years)

Member of the Board of Directors

ezDI, LLC

Privately Held; 201-500 employees; Hospital & Health Care industry

2012Present (2 years)

About involvement: http://j.mp/SWezCAC 

ezDI team: http://www.ezdi.us/?page_id=256

Advisor

Edamam LLC

Privately Held; 11-50 employees; Internet industry

December 2011Present (2 years 10 months) http://www.edamam.com/

Advisor (Technical advice on Semantic Technology/Semantic Web and Business advice incl. business model and development)

Advisory Committee Member

W3C

Nonprofit; 51-200 employees; Internet industry

2001December 2013 (12 years)

(a) Proposed (with IBM) WSDL-S that led to adoption of SAWSDL as W3C recommendation, (b) member submission SA-REST: Semantic Annotation of Web Resources, (c) co-chair: Semantic Web Services XG, (d) co-chair: Semantic Sensor Networking XG, (e) significant early input to Semantic Web for Health Care & Life Sciences (HCLS) WG.

Educational Institution; 10,001+ employees; Higher Education industry

July 1994December 2006 (12 years 6 months)

Director of the Large Scale Distributed Information Systems (LSDIS) Lab, with research in Semantic Web, Information Integration and Web Processes. http://lsdis.cs.uga.edu

Founder/COB/CEO/CTO/Chief Scientist

Taalee, Voquettee and Semagix

August 1999October 2006 (7 years 3 months)

Founded (CEO/VOB) VC funded Semantic Web company Taalee, Inc (08/99-06/2001). Merged with Voquette, served as CTO & SrVP. Acquired by Protege Group, resulting in Semagix, served as CTO until its acquisition by SearchSpace (a Warbug Pinkus company) in April 2006 that resulted in Fortent. Served as Chief Scientists until October 2006, until getting read to move to Dayton,OH. Fortent became part of Actimize around 2008.

In 2000, Taalee built MediaAnywhere, the first A/V Web Semantic Search and browsing. It had about 25 ontologies (News/Business, Sports, Entertainment, etc. ) and supported ontology based semantic (faceted) search, semantic browsing, semantic personalization, semantic targeting (advertisement), etc as is described in U.S. Patent #6311194, http://j.mp/SW-patent, 30 Oct. 2001 (filed 03/2000; 08/2000).

Taalee merged to become Voquette in 2001 (its product SCORE supported semantic technology based media management), Semagix in 2004 (its product Semagix Freedom was a comprehensive platform for building semantic applications such as CIRAS focused on risk and compliance applications in Financial Services and Government), and then Fortent in 2006 (products included Know Your Customers).

Founder, President, CEO

Infocosm, Inc

19961999 (3 years) Athens, Georgia Area

Infocosm, Inc. licensed METEOR distributed workflow management technology from University of Georgia developed by me and my colleagues/team and commercialized it. The commercial product called METEOR EApps (Enterprise Application Service) was a CORBA based version based on METEOR ORBWork. It sold licenses to companies such as Bellcore, MCC, Boeing. A METEOR WebWork, one of the first fully Web based workflow product was deployed for supporting applications such as Neonatal Workflow at Medical College of Georgia. [Infocosm, Inc is still active for occasional consulting, now focusing on Semantic Web related issues.

Member of Technical Staff

Telcordia Technologies

Privately Held; 1001-5000 employees; Telecommunications industry

19891994 (5 years)

METEOR (first research project on distributed workflow management): started 1991, operational use: 1994. InfoHarness (Faceted Web Search through metadata extracted from heterogeneous Web documents): started 1993, research paper: 1994, commercial product: 1994. Tried to convince Bellcore to make it an open Web search engine but business people could only think of traditional baby Bell market.

Staff Scientist

Unisys

Public Company; 10,001+ employees; UIS; Information Technology and Services industry

19871989 (2 years)

Worked on developing the Mermaid system - one of the earliest federated (distributed heterogeneous) database management systems (later commercialized by Marjorie Templeton as InterViso).

Student (EEE)

BITS, Pilani

19761981 (5 years) Pilani, Rajasthan, India

BE (Hons) in Electrical and Electronics Engineering; two internships at ISRO. Extracurricular activities in Solar Energy, developed interest in research.

Amit Sheth's Patents

  • System and method for creating a Semantic Web and its applications in Browsing, Searching, Profiling, Personalization and Advertising

    • United States Patent 6311194
    • Issued October 30, 2001
    Inventors: Amit Sheth, Clemens Bertram, David Avant

    A system and method for creating a database of metadata (metabase) of a variety of digital media content, including TV and radio content delivered on Internet. This semantic-based method captures and enhances domain or subject specific metadata of digital media content, including the specific meaning and intended use of original content. To support semantics, a WorldModel is provided that includes specific domain knowledge, ontologies as well as a set of rules relevant to the original content. The metabase may also be dynamic in that it may track changes to the any variety of accessible content, including live and archival TV and radio programming.

    WorldModel = Ontology

  • Method for enforcing the serialization of global multidatabase transactions through committing only on consistent subtransaction serialization by the local database managers

    • United States Patent 5241675
    • Issued August 31, 1993

    Our invention guarantees global serializability by preventing multidatabase transactions from being serialized in different ways at the participating local database systems (LDBS). In one embodiment tickets are used to inform the MDBS of the relative serialization order of the subtransactions of each global transactions at each LDBS. A ticket is a (logical) timestamp whose value is stored as a regular data item in each LDBS. Each substransaction of a global transaction is required to issue the take-a-ticket operations which consists of reading the value of the ticket (i.e., read ticket) and incrementing it (i.e., write (ticket+1)) through regular data manipulation operations. Only the subtransactions of global transactions take tickets. When different global transactions issue subtransactions at a local database, each subtransaction will include the take-a-ticket operations. Therefore, the ticket values associated with each global subtransaction at the MDBS reflect the local serialization order at each LDBS. The MDBS in accordance with our invention examines the ticket values to determine the local serialization order at the different LDBS's and only authorizes the transactions to commit if the serialization order of the global transactions is the same at each LDBS. In another embodiment, the LDBSs employ rigorous schedulers and the prepared-to-commit messages for each subtransaction are used by the MDBS to ensure global serializability.

Amit Sheth's Languages

  • Hindi

  • Gujarati

Amit Sheth's Skills & Expertise

  1. Web 3.0
  2. Semantic Web
  3. Social Web
  4. Semantic Sensor Web
  5. Web of Things
  6. Health 2.0
  7. Knowledg Extraction
  8. Information Extraction
  9. Knowledge Discovery
  10. Ontology
  11. Linked Data
  12. Semantic Search
  13. Semantic Analysis
  14. Semantic Interoperability
  15. Semantic Technologies
  16. Continuous Semantics
  17. Relationship Web
  18. Computing for Human Experience
  19. Web 2.0
  20. Computing
  21. Semantics
  22. Distributed Systems
  23. Ontologies
  24. Text Mining
  25. Knowledge Representation
  26. Cloud Computing
  27. Metadata
  28. RDF
  29. OWL
  30. Web Services
  31. Natural Language Processing
  32. Software Project Management
  33. Process Engineering
  34. Machine Learning
  35. Management
  36. Artificial Intelligence
  37. Big Data
  38. Information Retrieval
  39. Computer Science
  40. Algorithms
  41. Ontology Engineering
  42. Data Mining
  43. Software Engineering
  44. Text Analytics
  45. Databases
  46. Social Network Analysis
  47. Knowledge Management
  48. Open Source
  49. Scalability
  50. MapReduce

View All (50) Skills View Fewer Skills

Amit Sheth's Projects

  • SoCS: Social Media Enhanced Organizational Sensemaking in Emergency Response

    • September 2011 to Present

    Online social networks and always-connected mobile devices have empowered citizens and organizations to communicate and coordinate effectively in the wake of critical events. Specifically, there have been many examples of using Twitter to provide timely and situational information about emergencies to relief organizations, and to conduct ad-hoc coordination. This NSF sponsored multidisciplinary research involving Computer Scientists and Cognitive Scientists at Wright State University and Ohio State University seeks to understand the full ramifications of using social networks for effective organizational sensemaking in such contexts.

    This project is expected to have a significant impact in the specific context of disaster and emergency response. However, elements of this research are expected to have much wider utility, for example in the domains of e-commerce, and social reform. From a computational perspective, this project introduces the novel paradigm of spatio-temporal-thematic (STT) and people-content-network analysis (PCNA) of social media and traditional media content, implemented as part of Twitris (http://twitris.knoesis.org). Applications of STT and PCNA extend well beyond organized sensemaking. For social scientists, it provides a platform that can be used to assess relative efficacy of various organizational structures and is expected to provide new insights into the types of social network structures (mix of symmetric and asymmetric) that might be better suitable to propagate information in emergent situations. From an educational standpoint, the majority of funds will be used to train the next generation of interdisciplinary researchers drawn from the computational and social sciences.

    Keywords: Social Networking, Emergency Response, People-Content-Network Analysis (PCNA), Spatio-Temporal-Thematic Analysis (STT Analysis), Organizational Sensemaking, Collaborative Decision Making.

    Project Site: http://knoesis.org/research/semsoc/projects/socs

  • PREDOSE: PREscription Drug abuse Online-Surveillance and Epidemiology project

    • July 2011 to Present

    NIH funded PREDOSE is an inter-disciplinary collaborative project between the Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) and the Center for Interventions, Treatment and Addictions Research (CITAR) at Wright State University. The overall aim of PREDOSE is to develop automated techniques for web forum data analysis related to the illicit use of pharmaceutical opioids. This research complements traditional epidemiological studies involving interview based data gathering. Many Web 2.0 empowered social platforms, including Web forums and Twitter, provide venues for individuals to freely share their experiences, post questions, and offer comments about different drugs. PREDOSE aims to analyze such social data to provide timely and emerging information on the non-medical use of pharmaceutical opioids. Primary goals include:

    To determine user knowledge, attitudes and behavior related to the non-medical use of pharmaceutical opioids (namely buprenorphine) as discussed on social platforms

    To determine spatio-temporal trends and patterns in pharmaceutical opioid abuse as discussed on Web-based forums

    The project has already provided unusual and unexpected insights, such as self-treatment of opioid withdrawal symptoms with Loperamide.

  • Twitris+: 360 degree Social Media Analytics platform

    • November 2008 to Present

    Users are sharing voluminous social data (800M+ active Facebook users, 1B+ tweets/week) through social networking platforms accessible by Web and increasingly via mobile devices. This gives unprecedented opportunity to decision makers-- from corporate analysts to coordinators during emergencies, to answer questions or take actions related to a broad variety of activities and situations: who should they really engage with, how to prioritize posts for actions in the voluminous data stream, what are the needs and who are the resource providers in emergency event, how is corporate brand performing, and does the customer support adequately serve the needs while managing corporate reputation etc. We demonstrate these capabilities using Twitris+ by multi-faceted anlaysis along dimensions of Spatio-Temporal-Thematic (STT), People-Content-Network (PCN), and Subjectivity: Emotion-Sentiment-Intent (ESI). Twitris' diversity and depth of analysis is unprecedented. Twitris v1 [2009] focused on STT, Twitris v2 [2011] focused on PCN, and Twitris v3 [2012- ] initiated ESI, extended other dimensions by extending PAN analysis with expression capability involving use of background knowledge, and will soon add real-time analytics incorporating Kno.e.sis' Twarql framework.

    Twitris leverages an array of techniques and technologies that traditionally fall under big data (or scalable unstructured data analysis), social media analysis (including user generated content analysis), and Semantic Web (including extensive use of RDF), and algorithms that use statistical, linguistics, machine learning, and complex/semantic query processing.

    Key project alumni: Karthik Gomadam, Meena Nagarajan

    Research System (live): http://twitris.knoesis.org

  • kHealth - Knowledge-enabled Healthcare

    • January 2012 to Present

    kHealth – Knowledge-enabled Healthcare is a platform which integrates data from passive and active sensing (including both machine and human sensors) with background knowledge from domain ontologies, semantic reasoning, and mobile computing environments to help people make decisions to improve health, fitness, and wellbeing. kHealth utilizes technology from Semantic Sensor Web, Semantic Perception, and Intelligence at the Interface to enable advanced healthcare applications. Currently we are developing an app to reduce preventable hospital readmissions of patients with Acute Decompensated Heart Failure, and expect to trail it with patients with the hel of our clinical collaborators at OSU Wexner Medical Center. More research in applications to Asthma, GI, Obesity, COPD and other chronic diseases is ongoing.

  • Continuous Semantics and Realt-time Analysis of Social and Sensor Data

    • January 2010 to Present

    We’ve made significant progress in applying semantics and Semantic Web technologies in a range of domains. A relatively well-understood approach to reaping semantics’ benefits begins with formal modeling of a domain’s concepts and relationships, typically as an ontology. Then, we extract relevant facts — in the form of related entities — from the corpus of background knowledge and use them to populate the ontology. Finally, we apply the ontology to extract semantic metadata or to semantically annotate data in unseen or new corpora. Using annotations yields semanticsenhanced experiences for search, browsing, integration, personalization, advertising, analysis, discovery, situational awareness, and so on.This typically works well for domains that involve slowly evolving knowledge concentrated among deeply specialized domain experts and that have definable boundaries. However, this approach has difficulties dealing with dynamic domains involved in social, mobile, and sensor webs. This project looks at how continuous semantics can help us model those domains and analyze the related real-time data typically found on social, mobile, and sensor webs, that exhibit five characteristics. First, they’re spontaneous (arising suddenly). Second, they follow a period of rapid evolution, involving real-time or near real-time data, which requires continuous searching and analysis. Third, they involve many distributed participants with fragmented and opinionated information. Fourth, they accommodate diverse viewpoints involving topical or contentious subjects. Finally, they feature context colored by local knowledge as well as perceptions based on different observations and their sociocultural analysis.

  • Obvio

    • June 2010 to Present

    Obvio (spanish for obvious) is the name of the project on semantics-based techniques for Literature-Based Discovery (LBD) using Biomedical Literature. The goal of Obvio is to uncover hidden connections between concepts in text, thereby leading to hypothesis generation from publicly available scientific knowledge sources.
    It utilizes Semantic predications (assertions extracted from biomedical literature) for Literature-Based Discovery (LBD).

  • Twarql

    • January 2010 to Present

    Twitter has become a prominent medium to share opinions, observations and suggestions in real-time. Insights from these microposts ("Wisdom of the Crowd") has proved to be invaluable for businesses and researchers around the world. However, the microblog data published is increasing in numbers with the popularity and growth of Twitter. This has induced challenges in filtering these microblog data to cater the needs for aggregation and collective analysis for sensemaking. Twarql addresses these challenges by leveraging Semantic Web technologies to enable a flexible query language for filtering microblog posts.

  • MobiCloud

    • October 2009 to June 2012

    MobiCloud is a domain specific language (DSL) based cloud-mobile hybrid application generation framework. The project won the prestigious Technology Award at 2012 Fukuoka Ruby Award Competition (from among 82 entries from 9 countries).

  • PhylOnt : A Domain-Specific Ontology for Phylogenetic Analysis

    • February 2011 to Present

    PhylOnt is a collabotation project with University of Georgia. The specific objective of this reserach was to develop and deploy an ontology for a novel ontology-driven semantic problem solving approach in phylogenetic analysis and down- stream use of phylogenetic trees. This is a foundation to allow an integrated platform in phylogenetically based comparative analysis and data integration. PhylOnt is an extensible ontology, that describes the methods employed to estimate trees given a data matrix, models and programs used for phylogenetic analysis and descriptions of phylogenetic trees including branch-length information and support values. It also describes the provenance information for phylogenetics analysis data such as information about publications and studies related to phylogenetic analyses. To illustrate the utility of PhylOnt, I annotated scientific literature and files to support semantic search.

  • IntellegO - Semantic Perception Technology

    • January 2010 to Present

    Currently, there are many sensors collecting information about our environment, leading to an overwhelming number of observations that must be analyzed and explained in order to achieve situation awareness. As perceptual beings, we are also constantly inundated with sensory data; yet we are able to make sense out of our environment with relative ease. Semantic Perception is a computational framework, inspired by cognitive models of human perception, to derive actionable intelligence and situational awareness from low-level sensor data. The formalization of this ability utilizes prior knowledge encoded in domain ontologies, and hybrid abductive/deductive reasoning, to translate low-level observations into high-level abstractions. A declarative specification defined in OWL allows prior knowledge available on the Web, and annotated with Semantic Web languages, to be easily integrated into the framework.

  • Semantic Sensor Web

    • January 2008 to Present

    Millions of sensors around the globe currently collect avalanches of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with such diverse capabilities as range, modality, and maneuverability. It is possible today to utilize networks with multiple sensors to detect and identify objects of interest up close or from a great distance. The lack of integration and communication between these networks, however, often leaves this avalanche of data stovepiped and intensifies the existing problem of too much data and not enough knowledge. With a view to alleviating this glut, we propose that sensor data be annotated with semantic metadata to provide contextual information essential for situational awareness. In particular, Semantic Sensor Web is a framework for managing heterogeneity among sensor descriptions and sensor observation data through semantic modeling and annotation to enable advanced Web-based data integration, query, and inference. This project has helped to initiate a W3C Incubator Group, the Semantic Sensor Network XG, and develop a standard ontology and semantic annotation framework. These tools are achieving broad adoption and application within the sensing community for managing sensor data on the Web.

    Selected Publications
    - The SSN Ontology of the W3C Semantic Sensor Network Incubator Group (Journal of Web Semantics, 2012): http://knoesis.wright.edu/library/resource.php?id=1659
    - Semantic Sensor Network XG Final Report (W3C Incubator Group Report, 2011): http://www.knoesis.org/library/resource.php?id=1635
    - SemSOS: Semantic Sensor Observation Service (CTS, 2009): http://knoesis.wright.edu/library/resource.php?id=00596
    - Semantic Sensor Web (IEEE Internet Computing, 2008): http://knoesis.wright.edu/library/resource.php?id=00311

  • Advanced School on Service Oriented Computing

    • 2006 to Present

    The Advanced School on Service-Oriented Computing (SOC) brings together the best international experts on software and services with PhD students, young researchers and professionals from leading academic, research and industrial organizations across Europe and around the world. Students who attend the prestigious Erasmus Mundus International Master on Service Engineering (IMSE) participate in the Advanced School as part of their study program. Topics span the entire field of SOC from conceptual foundations to industrial applications.
    In addition to high quality training, the Advanced School helps forge a new research and scientific community on Service-Oriented Computing(SOC). The Advanced School fosters the free exchange of ideas and helps the participants to network and start new cooperative research projects. The School Directors are internationally known experts and researchers on SOC. This year the major themes of Advanced School on SOC are: Conceptual Foundations, Computing in the Clouds, People in SOCs and Emerging Topics.

  • Semantic Platform for Open Materials Science and Engineering

    • June 2013 to Present

    Innovations in materials play an essential role in our progress towards a better life - from improving laptop battery life to developing protective gears that prevent life threatening injuries and making aircraft more efficient. However, it often takes 20 years from the time of discovery to when a new material is put into practical applications. The Whitehouse’s Materials Genome Initiative (MGI; http://www.whitehouse.gov/mgi/) seeks to improve the US’ competitiveness in the 21st Century by discovering, manufacturing, and deploying advanced materials twice as fast, at a fraction of the cost. Kno.e.sis’ two related projects [1][2] involve collaboration between computer and material scientists, and will play a central role in developing the Digital Data component of MGI’s Materials Innovation Infrastructure.

    [1] Federated Semantic Services Platform for Open Materials Science and Engineering
    [2] Materials Database Knowledge Discovery and Data Mining

  • Location Prediction of Twitter Users

    • January 2014 to Present

    The geographic location of a Twitter user can be used in many applications such as Personalization and Recommendation systems. This work explores the use of an external knowledge-base (Wikipedia) to predict the location of a Twitter user based on the contents of their tweets and compares this approach to the existing statistical approaches. The key contribution of this work is that it does not require a training data set of geo-tagged tweets as used by the state-of-the-art approaches.

Amit Sheth's Publications

  • Contextual Ontology Alignment of LOD with an Upper Ontology: A Case Study with Proton.

    • Springer/LNCS
    • May 29, 2010

    The Linked Open Data (LOD) is a major milestone towards realizing the Semantic Web vision, and can enable applications such as robust Question Answering (QA) systems that can answer queries requiring multiple, disparate information sources. However, realizing these applications requires relationships at both the schema and instance level, but currently the LOD only provides relationships for the latter. To address this limitation, we present a solution for automatically finding schema-level links between two LOD ontologies -- in the sense of ontology alignment. Our solution, called BLOOMS+, extends our previous solution (i.e. BLOOMS) in two significant ways. BLOOMS+ 1) uses a more sophisticated metric to determine which classes between two ontologies to align, and 2) considers contextual information to further support (or reject) an alignment. We present a comprehensive evaluation of our solution using schema-level mappings from LOD ontologies to Proton (an upper level ontology) -- created manually by human experts for a real world application called FactForge. We show that our solution performed well on this task. We also show that our solution significantly outperformed existing ontology alignment solutions (including our previously published work on BLOOMS) on this same task.

  • Ontology Alignment for Linked Open Data.

    • 9th International Semantic Web Conference 2010 (ISWC 2010),
    • November 7, 2010

    The Web of Data currently coming into existence through the Linked Open Data (LOD) effort is a major milestone in realizing the Semantic Web vision. However, the development of applications based on LOD faces difficulties due to the fact that the different LOD datasets are rather loosely connected pieces of information. In particular, links between LOD datasets are almost exclusively on the level of instances, and schema-level information is being ignored. In this paper, we therefore present a system for finding schema-level links between LOD datasets in the sense of ontology alignment. Our system, called BLOOMS, is based on the idea of bootstrapping information already present on the LOD cloud. We also present a comprehensive evaluation which shows that BLOOMS outperforms state-of-the-art ontology alignment systems on LOD datasets. At the same time, BLOOMS is also competitive compared with these other systems on the Ontology Evaluation Alignment Initiative Benchmark datasets.

  • Linked Data Is Merely More Data

    • AAAI Spring Symposium
    • 2010

    In this position paper, we argue that the Linked Open Data (LoD) Cloud, in its current form, is only of limited value for furthering the Semantic Web vision. Being merely a weakly linked 'triple collection', it will only be of very limited benefit for the AI or Semantic Web communities. We describe the corresponding problems with the LoD Cloud and give directions for research to remedy the situation.

  • Flexible Bootstrapping-Based Ontology Alignment

    • The Fifth International Workshop on Ontology Matching collocated with the 9th International Semantic Web Conference ISWC-2010, November 7, 2010
    • November 7, 2010

    BLOOMS (Jain et al, ISWC2010) is an ontology alignment system which, in its core, utilizes the Wikipedia category hierarchy for establishing alignments. In this paper, we present a Plug-and-Play extension to BLOOMS, which allows to flexibly replace or complement the use of Wikipedia by other online or offline resources, including domain-specific ontologies or taxonomies. By making use of automated translation services and of Wikipedia in languages other than English, it makes it possible to apply BLOOMS to alignment tasks where the input ontologies are written in different languages.

  • SPARQL Query Re-writing for Spatial Datasets Using Partonomy Based Transformation Rules

    • Third International Conference on Geospatial Semantics (GeoS 2009)
    • December 4, 2009

    Often the information present in a spatial knowledge base is represented at a different level of granularity and abstraction than the query constraints. For querying ontology’s containing spatial information, the precise relationships between spatial entities has to be specified in the basic graph pattern of SPARQL query which can result in long and complex queries. We present a novel approach to help users intuitively write SPARQL queries to query spatial data, rather than relying on knowledge of the ontology structure. Our framework re-writes queries, using transformation rules to exploit part-whole relations between geographical entities to address the mismatches between query constraints and knowledge base. Our experiments were performed on completely third party datasets and queries. Evaluations were performed on Geonames dataset using questions from National Geographic Bee serialized into SPARQL and British Administrative Geography Ontology using questions from a popular trivia website. These experiments demonstrate high precision in retrieval of results and ease in writing queries.

  • Mediatability: Estimating the Degree of Human Involvement in XML Schema Mediation

    • International Conference on Semantic Computing
    • August 2008

    Mediation and integration of data are significant challenges because the number of services on the Web, and heterogeneities in their data representation, continue to increase rapidly. To address these challenges we introduce a new measure, mediatability, which is a quantifiable and computable metric for the degree of human involvement in XML schema mediation. We present an efficient algorithm to compute mediatability and an experimental study to analyze how semantic annotations affect the ease of mediating between two schemas. We validate our approach by comparing mediatability scores generated by our system with user-perceived difficulty. We also evaluate the scalability of our system on alarge number of exisiting APIs.

  • Spatio-Temporal-Thematic Analysis of Citizen Sensor Data: Challenges and Experiences

    • Web Information Systems Engineering
    • October 2009

    We present work in the spatio-temporal-thematic analysis of citizen-sensor observations pertaining to real-world events. Using Twitter as a platform for obtaining crowd-sourced observations, we explore the interplay between these 3 dimensions in extracting insightful summaries of social perceptions behind events. We present our experiences in building a web mashup application, Twitris (http://twitris.knoesis.org) that extracts and facilitates the spatio-temporal-thematic exploration of event descriptor summaries.

  • A Semantic Framework for Identifying Events in a Service Oriented Architecture.

    • International Conference on Web Services
    • July 2007

    We propose a semantic framework for automatically
    identifying events as a step towards developing an adaptive
    middleware for Service Oriented Architecture (SOA).
    Current related research focuses on adapting to events that
    violate certain non-functional objectives of the service requestor.
    Given the large of number of events that can happen
    during the execution of a service, identifying events that
    can impact the non-functional objectives of a service request
    is a key challenge. To address this problem we propose
    an approach that allows service requestors to create
    semantically rich service requirement descriptions, called
    semantic templates. We propose a formal model for expressing
    semantic templates and for measuring the relevance of
    an event to both the action being performed and the nonfunctional
    objectives. This model is extended to adjust the
    relevance of the events based on feedback from the underlying
    adaptation framework. We present an algorithm that
    utilizes multiple ontologies for identifying relevant events
    and present our evaluations that measure the efficiency of
    both the event identification and the subsequent adaptation
    scheme.

  • A Faceted Classification Based Approach to Search and Rank Web APIsKarthik Gomadam, Ajith Ranabahu, Meenakshi Nagarajan, Amit P. Sheth, Kunal Verma: A Faceted Classification Based Approach to Search and Rank Web APIs. ICWS 2008: 177-184

    • International Conference on Web Services
    • September 2008

    Web application hybrids, popularly known as mashups,
    are created by integrating services on the Web using their
    APIs. Support for finding an API is currently provided by
    generic search engines or domain specific solutions such
    as ... Shortcomings of both these solutions in terms of and
    reliance on user tags make the task of identifying an API
    challenging. Since these APIs are described in HTML documents,
    it is essential to look beyond the boundaries of current
    approaches to Web service discovery that rely on formal
    descriptions. In this work, we present a faceted approach
    to searching and ranking Web APIs that takes into
    consideration attributes or facets of the APIs as found in
    their HTML descriptions. Our method adopts current research
    in document classification and faceted search and
    introduces the serviut score to rank APIs based on their utilization
    and popularity. We evaluate classification, search
    accuracy and ranking effectiveness using available APIs
    while contrasting our solution with existing ones.

  • A Domain Specific Language for Enterprise Grade Cloud-Mobile Hybrid Applications

    • 11th Workshop on Domain-Specific Modeling (DSM)
    • October 23, 2011

    Cloud computing has changed the technology landscape by
    ordering flexible and economical computing resources to the
    masses. However, vendor lock-in makes the migration of applications and data across clouds an expensive proposition.
    The lock-in is especially serious when considering the new
    technology trend of combining cloud with mobile devices.
    In this paper, we present a domain-specific language (DSL)
    that is purposely created for generating hybrid applications
    spanning across mobile devices as well as computing clouds.
    We propose a model-driven development process that makes
    use of a DSL to provide sufficient programming abstractions
    over both cloud and mobile features. We describe the underlying domain modeling strategy as well as the details of
    our language and the tools supporting our approach.

  • Finding Influential Authors in Brand-Page Communities

    • 6th Int'l AAAI Conference on Weblogs and Social Media (ICWSM)
    • June 4, 2012

    Enterprises are increasingly using social media forums to engage with their customer online- a phenomenon known as Social Customer Relation Management (Social CRM). In this context, it is important for an enterprise to identify “influential authors” and engage with them on a priority basis. We present a study towards finding influential authors on Twitter forums where an implicit network based on user interactions is created and analyzed. Furthermore, author profile features and user interaction features are combined in a decision tree classification model for finding influential authors. A novel objective evaluation criterion is used for evaluating various features and modeling techniques. We compare our methods with other approaches that use either only the formal connections or only the author profile features and show a significant improvement in the classification accuracy over these baselines as well as over using Klout score.

  • Prediction of Topic Volume on Twitter

    • 4th Int'l ACM Conference of Web Science (WebSci)
    • June 2012

    [Extended Abstract] We discuss an approach for predicting microscopic (individual) and macroscopic (collective) user behavioral patterns with respect to specific trending topics on Twitter. Going beyond previous efforts that have analyzed driving factors in whether and when a user will publish topic-relevant tweets, here we seek to predict the strength of content generation which allows more accurate understanding of Twitter usersÂ’ behavior and more effective utilization of the online social network for diffusing information. Unlike traditional approaches, we consider multiple dimensions into one regression-based prediction framework covering network structure, user interaction, content characteristics and past activity. Experimental results on three large Twitter datasets demonstrate the efficacy of our proposed method. We find in particular that combining features from multiple aspects (especially past activity information and network features) yields the best performance. Furthermore, we observe that leveraging more past information leads to better prediction performance, although the marginal benefit is diminishing.

  • Framework for the Analysis of Coordination in Crisis Response

    • Collaboration & Crisis Informatics, CSCW-2012
    • February 2012

    Social Media play a critical role during crisis events, revealing a natural coordination dynamic. We propose a computational framework guided by social science principles to measure, analyze, and understand coordination among the different types of organizations and actors in crisis response. The analysis informs both the scientific account of cooperative behavior and the design of applications and protocols to support crisis management.

  • A Qualitative Examination of Topical Tweet and Retweet Practices

    • 4th Int'l AAAI Conference on Weblogs and Social Media (ICWSM)
    • May 2010

    This work contributes to the study of retweet behavior on Twitter surrounding real-world events. We analyze over a million tweets pertaining to three events, present general tweet properties in such topical datasets and qualitatively analyze the properties of the retweet behavior surrounding the most tweeted/viral content pieces. Findings include a clear relationship between sparse/dense retweet patterns and the content and type of a tweet itself; suggesting the need to study content properties in link-based diffusion models.

  • Provenance Aware Linked Sensor Data

    • 2nd Workshop on Trust and Privacy on the Social and Semantic Web
    • May 30, 2010

    Provenance, from the French word 'provenir', describes the lineage or history of a data entity. Provenance is critical information in the sensors domain to identify a sensor and analyze the observation data over time and geographical space. In this paper, we present a framework to model and query the provenance information associated with the sensor data exposed as part of the Web of Data using the Linked Open Data conventions. This is accomplished by developing an ontology-driven provenance management infrastructure that includes a representation model and query infrastructure. This provenance infrastructure, called Sensor Provenance Management System (PMS), is underpinned by a domain specific provenance ontology called Sensor Provenance (SP) ontology. The SP ontology extends the Provenir upper level provenance ontology to model domain-specific provenance in the sensor domain. In this paper, we describe the implementation of the Sensor PMS for provenance tracking in the Linked Sensor Data.

  • Sensor Discovery on Linked Data

    • Technical Report
    • December 2009

    There has been a drive recently to make sensor data accessible on the Web. However, because of the vast number of sensors collecting data about our environment, finding relevant sensors on the Web is a non-trivial challenge. In this paper, we present an approach to discovering sensors through a standard service interface over Linked Data. This is accomplished with a semantic sensor network middleware that includes a sensor registry on Linked Data and a sensor discovery service that extends the OGC Sensor Web Enablement. With this approach, we are able to access and discover sensors that are positioned near named-locations of interest.

  • Demonstration: Real-Time Semantic Analysis of Sensor Streams

    • Proceedings of the 4th International Workshop on Semantic Sensor Networks
    • October 2011

    The emergence of dynamic information sources – including sensor networks – has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years. With this coming data explosion, real-time analytics software must either adapt or die. This paper focuses on the task of integrating and analyzing multiple heterogeneous streams of sensor data with the goal of creating meaningful abstractions, or features. These features are then temporally aggregated into feature streams. We will demonstrate an implemented framework, based on Semantic Web technologies, that creates feature streams from sensor streams in real-time, and publishes these streams as Linked Data. The generation of feature streams can be accomplished in reasonable time and results in massive data reduction.

  • METEOR-S Web Service Annotation Framework (MWSAF)

    • Proceedings of Thirteenth International World Wide Web Conference
    • May 20, 2004

    Patil. A., Oundhakar S., Sheth A., and Verma K., “METEOR-S Web Service Annotation Framework (MWSAF)”, Proceedings of Thirteenth International World Wide Web Conference, May 2004 (WWW2004), pp. 553-562

  • Demonstration: Real-Time Semantic Analysis of Sensor Streams

    • Proceedings of the 4th International Workshop on Semantic Sensor Networks
    • October 23, 2011

    The emergence of dynamic information sources – including sensor networks – has led to large streams of real-time data on the Web. Research studies suggest, these dynamic networks have created more data in the last three years than in the entire history of civilization, and this trend will only increase in the coming years. With this coming data explosion, real-time analytics software must either adapt or die. This paper focuses on the task of integrating and analyzing multiple heterogeneous streams of sensor data with the goal of creating meaningful abstractions, or features. These features are then temporally aggregated into feature streams. We will demonstrate an implemented framework, based on Semantic Web technologies, that creates feature streams from sensor streams in real-time, and publishes these streams as Linked Data. The generation of feature streams can be accomplished in reasonable time and results in massive data reduction.

  • Transactions in Transactional Workflows

    • Advanced Transaction Models and Architectures, S. Jajodia and L. Kerschberg (Eds.), Kluwer Academic Publishers, 1997, pp. 3-34.
    • 1997

    Workflow management systems (WFMSs) are finding wide applicability in small and large organizational settings. Advanced transaction models (ATMs) focus on maintaining data consistency and have provided solutions to many problems such as correctness, consistency, and reliability in transaction processing and database management environments. While such concepts have yet to be solved in the domain of workflow systems, database researchers have proposed to use, or attempted to use ATMs to model workflows. In this paper we survey the work done in the area of transactional workflow systems. We then argue that workflow requirements in large-scale enterprise-wide applications involving heterogeneous and distributed environments either differ or exceed the modeling and functionality support provided by ATMs. We propose that an ATM is unlikely to provide the primary basis for modeling of workflow applications, and subsequently workflow management. We discuss a framework for error handling and recovery in the METEOR WFMS that borrows from relevant work in ATMs, distributed systems, software engineering, and organizational sciences. We have also presented various connotations of transactions in real-world organizational processes today. Finally, we point out the need for looking beyond ATMs and using a multi-disciplinary approach for modeling large-scale workflow applications of the future.

  • Linked Sensor Data

    • Proceedings of 2010 International Symposium on Collaborative Technologies and Systems (CTS 2010)
    • May 14, 2010

    A number of government, corporate, and academic organizations are collecting enormous amounts of data provided by environmental sensors. However, this data is too often locked within organizations and underutilized by the greater community. In this paper, we present a framework to make this sensor data openly accessible by publishing it on the Linked Open Data (LOD) Cloud. This is accomplished by converting raw sensor observations to RDF and linking with other datasets on LOD. With such a framework, organizations can make large amounts of sensor data openly accessible, thus allowing greater opportunity for utilization and analysis.

  • Automatic Composition of Semantic Web Services Using Process Mediation

    • 9th International Conference on Enterprise Information Systems (ICES 2007), Funchal, Portugal, June 12–16, 2007, pp. 453–461.
    • June 12, 2007

    Web service composition has quickly become a key area of research in the services oriented architecture community. One of the challenges in composition is the existence of heterogeneities across independently created and autonomously managed Web service requesters and Web service providers. Previous work in this area
    either involved significant human effort or in cases of the efforts seeking to provide largely automated approaches, overlooked the problem of data heterogeneities, resulting in partial solutions that would not support executable workflow for real-world problems. In this paper, we present a planning-based approach to solve
    both the process heterogeneity and data heterogeneity problems. Our system successfully outputs a BPEL file which correctly solves a non-trivial real-world problem in the 2006 SWS Challenge.

    Full citation:
    Wu, Zixin,Karthik Gomadam, Ajith Ranabahu, Amit P. Sheth, and John A. Miller, “Automatic Composition of Semantic Web Services using Process Mediation,”in Proceedings of the 9th International Conference on Enterprise Information Systems (ICES 2007), Funchal, Portugal, June 12–16, 2007, pp. 453–461.

  • Towards Cloud Mobile Hybrid Application Generation using Semantically Enriched Domain Specific Languages

    • International Workshop on Mobile Computing and Clouds (MobiCloud 2010)
    • October 28, 2010

    The advancements in computing have resulted in a boom of cheap, ubiquitous, connected mobile devices as well as seemingly unlimited, utility style, pay as you go computing resources, commonly referred to as Cloud computing. Taking advantage of this computing landscape, however, has been hampered by the many heterogeneities that exist in the mobile space as well as the Cloud space.This research attempts to introduce a disciplined methodology to develop Cloud-mobile hybrid applications by using a Domain Specific Language(DSL) centric approach to generate applications. A Cloud-mobile hybrid is an application that is split between a Cloud based back-end and a mobile device based front-end. We present mobicloud, our prototype system we built based on a DSL that is capable of developing these hybrid applications. This not only reduces the learning curve but also shields the developers from the native complexities of the target platforms. We also present our vision on propelling this research forward by enriching the DSLs with semantics. The high-level vision is outline in the ambitious Cirrocumulus project, the driving principle being write once - run on any device.

  • The METEOR-S Approach for Configuring and Executing Dynamic Web Processes

    • technical report, LSDIS Technical Report 05-001, 2005.
    • May 6, 2004

    Web processes are the next generation workflows created using Web services. This paper addresses research issues in creating a framework for configuring and executing dynamic Web processes. The configuration module uses Semantic Web
    service discovery, integer linear programming and logic based constraint satisfaction to configure the process, based on quantitative and non-quantitative
    process constraints. Semantic representation of Web services and process constraints are used to achieve dynamic configuration. An execution environment is
    presented, which can handle heterogeneities at the protocol and data level by using proxies with data and protocol mediation capabilities. In cases of Web
    service failures, we present an approach to reconfigure the process at run-time, without violating the process constraints. Empirical testing of the execution environment is performed to compare deployment-time and run-time binding.

  • Composing semantic web services with interaction protocols

    • Technical report, LSDIS Lab, University of Georgia, Athens, Georgia (2006)
    • 2005

    Web service composition has quickly become an important area ofresearch in the services oriented architecture community. One of the challengesin composition is the existence of heterogeneities between independentlycreated and autonomously managed Web service requesters and Web serviceproviders. This paper focuses on the problem of composing Web services inthe presence of ordering constraints on their operations imposed by the serviceproviders. We refer to the ordering constraints on an services operations asinteraction protocol We present a novel approach to composition involvingwhat we term as pseudo operations to expressively capture the serviceprovider’s interaction protocol. Pseudo operations are used to resolveheterogeneities by constructing a plan of services in a more intelligent andefficient manner. They accomplish this by utilizing descriptive humanknowledge from the service provider and capture this knowledge as part of aplanning problem to create more flexible and expressive Web serviceoperations that may be made available to service requesters. We use acustomer-retailer scenario to show that this method alleviates planningcomplexities and generates more robust Web service compositions. Empiricaltesting was performed using this scenario and compared to existing methods toshow the improvement attributable to our method.

  • Data Driven Knowledge Acquisition Method for Domain Knowledge Enrichment in Healthcare

    • BIBM 2012, Philadelphia
    • October 4, 2012

    Semantic computing technologies have matured to be applicable to many critical domains, such as life sciences and health care. However, the key to their success is the rich domain knowledge which consists of domain concepts and relationships, whose creation and refinement remains a challenge. In this paper, we develop a technique for enriching domain knowledge, focusing on populating the domain relationships. We determine missing relationships between the domain concepts by validating domain knowledge against real world data sources. We evaluate our approach in the healthcare domain using Electronic Medical Record(EMR) data, and demonstrate that semantic techniques can be used to semi-automate labour intensive tasks without sacrificing fidelity of domain knowledge.

  • Extracting Diverse Sentiment Expressions with Target-dependent Polarity from Twitter

    • In Proceedings of the 6th International AAAI Conference on Weblogs and Social Media (ICWSM)
    • March 2012

    The problem of automatic extraction of sentiment expressions from informal text, as in microblogs such as tweets is a recent area of investigation. Compared to formal text, such as in product reviews or news articles, one of the key challenges lies in the wide diversity and informal nature of sentiment expressions that cannot be trivially enumerated or captured using predefined lexical patterns. In this work, we present an optimization-based approach to automatically extract sentiment expressions for a given target (e.g., movie, or person) from a corpus of unlabeled tweets. Specifically, we make three contributions: (i) we recognize a diverse and richer set of sentiment-bearing expressions in tweets, including formal and slang words/phrases, not limited to pre-specified syntactic patterns; (ii) instead of associating sentiment with an entire tweet, we assess the target-dependent polarity of each sentiment expression. The polarity of sentiment expression is determined by the nature of its target; (iii) we provide a novel formulation of assigning polarity to a sentiment expression as a constrained optimization problem over the tweet corpus. Experiments conducted on two domains, tweets mentioning movie and person entities, show that our approach improves accuracy in comparison with several baseline methods, and that the improvement becomes more prominent with increasing corpus sizes.

  • Are Twitter Users Equal in Predicting Elections? A Study of User Groups in Predicting 2012 U.S. Republican Presidential Primaries

    • In Proceedings of the Fourth International Conference on Social Informatics (SocInfo'12)
    • December 5, 2012

    Existing studies on predicting election results are under the assumption that all the users should be treated equally. However, recent work[14] shows that social media users from different groups (e.g., 'silent majority' vs. 'vocal minority') have significant differences in the generated content and tweeting behavior. The effect of these differences on predicting election results has not been exploited yet. In this paper, we study the spectrum of Twitter users who participate in the on-line discussion of 2012 U.S. Republican Presidential Primaries, and examine the predictive power of different user groups (e.g., highly engaged users vs. lowly engaged users, right-leaning users vs. left-leaning users) against Super Tuesday primaries in 10 states. Specifically, we characterize users across four dimensions, including three dimensions of user participation measured by tweet-based properties (engagement degree, tweet mode, and content type) and one dimension of users' political preference. We study different groups of users in each dimension and compare them on the task of electoral prediction. The insights gained in this study can shed light on improving the social media based prediction from the user sampling perspective and more.

    Presentation at: http://www.slideshare.net/knoesis/are-twitter-users-equal-in-predicting-elections-insights-from-republican-primaries-and-2012-general-election

  • Harnessing Twitter ‘Big Data’ for Automatic Emotion Identification

    • 2012 ASE International Conference on Social Computing, SocialCom 2012
    • September 2012

    User generated content on Twitter (produced at an enormous rate of 340 million tweets per day) provides a rich source for gleaning people's emotions, which is necessary for deeper understanding of people's behaviors and actions. Extant studies on emotion identification lack comprehensive coverage of 'emotional situations' because they use relatively small training datasets. To overcome this bottleneck, we have automatically created a large emotion-labeled dataset (of about 2.5 million tweets) by harnessing emotion-related hashtags available in the tweets. We have applied two different machine learning algorithms for emotion identification, to study the effectiveness of various feature combinations as well as the effect of the size of the training data on the emotion identification task. Our experiments demonstrate that a combination of unigrams, bigrams, sentiment/emotion-bearing words, and parts-of-speech information is most effective for gleaning emotions. The highest accuracy (65.57%) is achieved with a training data containing about 2 million tweets.

  • Semantic Annotation and Search for resources in the next GenerationWeb with SA-REST

    • W3C Workshop on Data and Services Integration, October 20-21 2011, Bedford, MA, USA.
    • October 20, 2011

    SA-REST, the W3C member submission, can be used for supporting a wide variety of Plain Old Semantic HTML (POSH) annotation capabilities on any type of Web resource. Kino framework and tools provide support of capabilities to realize SA-REST‟s promised value. These tools include (a) a browser-plugin to support annotation of a Web resource (including services) with respect to an ontology, domain model or vocabulary, (b) an annotation aware indexing engine and (c) faceted search and selection of the Web resources. At one end of the spectrum, we present KinoE (aka Kino for Enterprise) which uses NCBO formal ontologies and associated services for searching ontologies and mappings, for annotating RESTful services and Web APIs, which are then used to support faceted search. At another end of the spectrum, we present Kino W (aka Kino for the Web), capable of adding SA-REST or Microdata annotations to Web pages, using Schema.org as a model and Linked Open Data (LOD) as a knowledge base. We also present two use cases based on KinoE and the benefits to data and service integration enabled through this annotation approach.

  • A Web-Based Study of Self-Treatment of Opioid Withdrawal Symptoms with Loperamide

    • The College on Problems of Drug Dependence (CPDD)
    • June 9, 2012

    Aims: Many websites provide a medium for individuals to freely share their experiences and knowledge about different drugs. Such user-generated content can be used as a rich data source to study emerging drug use practices and trends. The study aims to examine web-based reports of loperamide use practices among non-medical opioid users. Loperamide, a piperidine derivative, is an opioid agonist approved for the control of diarrhea symptoms. Because of its general inability to cross the blood-brain barrier, it is considered to have no abuse potential and is available without a prescription.

    Methods: A website that allows free discussion of illicit drugs and is accessible for public viewing was selected for analysis. Web-forum posts were retrieved using Web Crawlers and retained in an Informal Text Database. All unique user names were anonymized. The database was queried to extract posts with a mention of loperamide and relevant brand/slang terms.Over 1200 posts were identified and entered into NVivo to assist with consistent application of codes related to the reasons, dosage, and effects of loperamide use.

    Results: Since the first post in 2005, there was a substantial rise in discussions related to its use by non-medical opioid users, especially in 2009-2011. Loperamide was primarily discussed as a remedy to alleviate a broad range of opiate withdrawal symptoms, and was sometimes referred to as 'poor man's methadone.' Typical doses frequently ranged from 100 mg to 200 mg per day, much higher than an indicated dose of 16 mg per day.

    Conclusions: This study suggests that loperamide is being used extra-medically by people who are involved with the abuse of opioids to control withdrawal symptoms. There is a growing demand among people who are opioid dependent for drugs to control withdrawal symptoms, and loperamide appears to fit that role. The study also highlights the potential of the Web as a 'leading edge' data source in identifying emerging drug use practices.

  • 'I just wanted to tell you that loperamide WILL WORK': A Web-Based Study of Extra-Medical Use of Loperamide.

    • Journal of Drug and Alcohol Dependence
    • November 2012

    Aims: Many websites provide a means for individuals to share their experiences and knowledge about different drugs. Such User-Generated Content (UGC) can be a rich data source to study emerging drug use practices and trends. This study examined UGC on extra-medical use of loperamide (e.g., Imodium® A-D) among illicit opioid users.

    Methods: A website that allows for the free discussion of illicit drugs and is accessible for public viewing was selected for analysis. Web-forum posts were retrieved using Web Crawlers and retained in a local text database. The database was queried to extract posts with a mention of loperamide and relevant brand/slang terms. Over 1,290 posts were identified. A random sample of 258 posts was coded using NVivo to identify intent, dosage, and side-effects of loperamide use.

    Results: There has been an increase in discussions related to loperamide's use by non-medical opioid users, especially in 2010-2011. Loperamide was primarily discussed as a remedy to alleviate a broad range of opiate withdrawal symptoms, and was sometimes referred to as 'poor man's' methadone. Typical doses ranged 70-100 mg per day, much higher than an indicated daily dose of 16 mg.

    Conclusions: This study suggests that loperamide is being used extra-medically to self-treat opioid withdrawal symptoms. There is a growing demand among people who are opioid dependent for drugs to control withdrawal symptoms, and loperamide appears to fit that role. The study also highlights the potential of the Web as a 'leading edge' data source in identifying emerging drug use practices.

  • METEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services

    • Journal of Information Technology and Management
    • May 2005

    Verma K., Sivashanmugam K., Sheth A., Patil A., Oundhakar S. and Miller J., “METEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services”, Journal of Information Technology and Management, Special Issue on Universal Global Integration, Vol. 6, No. 1 (2005) pp. 17-39. Kluwer Academic Publishers

  • METEOR–S WSDI: A Scalable Infrastructure of Registries for Semantic Publication and Discovery of Web Services

    • Journal of Information Technology and Management
    • January 2005

    Kunal Verma, Kaarthik Sivashanmugam,Amit Sheth, Abhijit Patil, Swapna Oundhakar, andJohn Miller, 'METEOR-S WSDI: A Scalable P2P Infrastructure of Registries for Semantic Publication and Discovery of Web Services,'Information Technology and Management 6 (no. 1), January 2005, pp. 17-39.

    Web services are the new paradigm for distributed computing. They have much to offer towards interoperability of applications and integration of large scale distributed systems. To make Web services accessible to users, service providers use Web service registries to publish them. Current infrastructure of registries requires replication of all Web service publications in all Universal Business Registries (UBR) which provide text and taxonomy based search capabilities. Large growth in number of Web services as well as the growth in the number of registries would make this replication impractical. In addition, the current Web service discovery mechanism is inefficients. Semantic discovery or matching of services is a promising approach to address this challenge. In this paper, we present a scalable, high performance environment for federated Web service publication and discovery among multiple registries. This work uses an ontology-based approach to organize registries, enabling semantic classification of all Web services based on domains. Each of these registries supports semantic publication of the Web services, which is used during discovery process. We have implemented two algorithms each for semantic publication and one algorithm for semantic discovery of Web services. We believe that the semantic approach suggested in this paper will significantly improve Web services publication and discovery involving a large number of registries. As a part of the METEOR-S project, we have leveraged the peer-to-peer networking as a scalable infrastructure for registries that can support automated and semi-automated Web service publication and discovery.

  • Task Scheduling Using Intertask Dependencies in Carnot

    • ACM SIGMOD Intl. Conf. On the Management of Data
    • 1993

    The Carnot Project at MCC is addressing the problem of logically unifying physically-distributed, enterprise-wide, heterogeneous information. Carnot will provide a user with the means to navigate information efficiently and transparently, to update that information consistently, and to write applications easily for large, heterogeneous, distributed information systems. A prototype has been implemented which provides services for (a) enterprise modeling and model integration to create an enterprise-wide view, (b) semantic expansion of queries on the view to queries on individual resources, and (c) inter-resource consistency management. This paper describes the Carnot approach to transaction processing in environments where heterogeneous, distributed, and autonomous systems are required to coordinate the update of the local information under their control. In this approach, subtransactions are represented as a set of tasks and a set of intertask dependencies that capture the semantics of a particular relaxed transaction model. A scheduler has been implemented which schedules the execution of these tasks in the Carnot environment so that all intertask dependencies are satisfied.

  • An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web

    • Journal of Applied Ontology
    • December 2011

    Today, many sensor networks and their applications employ a brute force approach to collecting and analyzing sensor data. Such an approach often wastes valuable energy and computational resources by unnecessarily tasking sensors and generating observations of minimal use. People, on the other hand, have evolved sophisticated mechanisms to efficiently perceive their environment. One such mechanism includes the use of background knowledge to determine what aspects of the environment to focus our attention. In this paper, we develop an ontology of perception, IntellegO, that may be used to more efficiently convert observations into perceptions. IntellegO is derived from cognitive theory, encoded in set-theory, and provides a formal semantics of machine perception. We then present an implementation that iteratively and efficiently processes low level, heterogeneous sensor data into knowledge through use of the perception ontology and domain specific background knowledge. Finally, we evaluate IntellegO by collecting and analyzing observations of weather conditions on the Web, and show significant resource savings in the generation and storage of perceptual knowledge.

  • Semantic Perception: Converting Sensory Observations to Abstractions

    • IEEE Internet Computing, Special Issue on Context-Aware Computing: Beyond Search and Location-Based Services
    • March 2012

    An abstraction is a representation of an environment derived from sensor observation data. Generating an abstraction requires inferring explanations from an incomplete set of observations (often from the Web) and updating these explanations on the basis of new information. This process must be fast and efficient. The authors' approach overcomes these challenges to systematically derive abstractions from observations. The approach models perception through the integration of an abductive logic framework called Parsimonious Covering Theory with Semantic Web technologies. The authors demonstrate this approach's utility and scalability through use cases in the healthcare and weather domains.

  • An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices

    • 11th International Semantic Web Conference
    • November 11, 2012

    The primary challenge of machine perception is to define efficient computational methods to derive high-level knowledge from low-level sensor observation data. Emerging solutions are using ontologies for expressive representation of concepts in the domain of sensing and perception, which enable advanced integration and interpretation of heterogeneous sensor data. The computational complexity of OWL, however, seriously limits its applicability and use within resource-constrained environments, such as mobile devices. To overcome this issue, we employ OWL to formally define the inference tasks needed for machine perception - explanation and discrimination - and then provide efficient algorithms for these tasks, using bit-vector encodings and operations. The applicability of our approach to machine perception is evaluated on a smart-phone mobile device, demonstrating dramatic improvements in both efficiency and scale.

  • Linked Sensor Data

    • Proceedings of 2010 International Symposium on Collaborative Technologies and Systems (CTS 2010)
    • May 14, 2010

    A number of government, corporate, and academic organizations are collecting enormous amounts of data provided by environmental sensors. However, this data is too often locked within organizations and underutilized by the greater community. In this paper, we present a framework to make this sensor data openly accessible by publishing it on the Linked Open Data (LOD) Cloud. This is accomplished by converting raw sensor observations to RDF and linking with other datasets on LOD. With such a framework, organizations can make large amounts of sensor data openly accessible, thus allowing greater opportunity for utilization and analysis.

  • Physical-Cyber-Social Computing: An Early 21st Century Approach

    • IEEE Intelligent Systems
    • January 2013

    Visionaries and scientists from the early days of computing and electronic communication have discussed the proper role of technology to improve human experience. Technology now plays an increasingly important role in facilitating and improving personal and social activities and engagements, decision making, interaction with physical and social worlds, generating insights, and just about anything that a human, as an intelligent being, seeks to do. This article presents a vision of Physical-Cyber-Social (PCS) computing for a holistic treatment of data, information, and knowledge from physical, cyber, and social worlds to integrate, understand, correlate, and provide contextually relevant abstractions to humans.

    Cite: Amit Sheth, Pramod Anantharam, Cory Henson, "Physical-Cyber-Social Computing: An Early 21st Century Approach," IEEE Intelligent Systems, vol. 28, no. 1, pp. 78-82, Jan.-Feb., 2013

    Access: http://knoesis.wright.edu/library/resource.php?id=1816

    More: http://wiki.knoesis.org/index.php/PCS

  • Semantic Sensor Web

    • IEEE Internet Computing
    • July 2008

    In recent years, sensors have been increasingly adopted by a diverse array of disciplines, such as meteorology for weather forecasting and wildfire detection, civic planning for traffic management, satellite imaging for earth and space observation, medical sciences for patient care using biometric sensors, and homeland security for radiation and biochemical detection at ports. Sensors are thus distributed across the globe, leading to an avalanche of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with diverse capabilities such as range, modality, and maneuverability. Today, it's possible to use sensor networks to detect and identify a multitude of observations, from simple phenomena to complex events and situations. The lack of integration and communication between these networks, however, often isolates important data streams and intensifies the existing problem of too much data and not enough knowledge. With a view to addressing this problem, we discuss a semantic sensor Web (SSW) in which sensor data is annotated with semantic metadata to increase interoperability as well as provide contextual information essential for situational knowledge. In particular, this involves annotating sensor data with spatial, temporal, and thematic semantic metadata.

    Cite: Amit Sheth, Cory Henson, and Satya Sahoo, 'Semantic Sensor Web,' IEEE Internet Computing, 14 (1), July/August 2008, p. 78-83.

    Access: http://knoesis.wright.edu/library/resource.php?id=00311

  • SemSOS: Semantic Sensor Observation Service

    • International Symposium on Collaborative Technologies and Systems
    • May 2009

    Sensor Observation Service (SOS) is a Web service specification defined by the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) group in order to standardize the way sensors and sensor data are discovered and accessed on the Web. This standard goes a long way in providing interoperability between repositories of heterogeneous sensor data and applications that use this data. Many of these applications, however, are ill equipped at handling raw sensor data as provided by SOS and require actionable knowledge of the environment in order to be practically useful. There are two approaches to deal with this obstacle, make the applications smarter or make the data smarter. We propose the latter option and accomplish this by leveraging semantic technologies in order to provide and apply more meaningful representation of sensor data. More specifically, we are modeling the domain of sensors and sensor observations in a suite of ontologies, adding semantic annotations to the sensor data, using the ontology models to reason over sensor observations, and extending an open source SOS implementation with our semantic knowledge base. This semantically enabled SOS, or SemSOS, provides the ability to query high-level knowledge of the environment as well as low-level raw sensor data.

  • An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web

    • Journal of Applied Ontology
    • December 2011

    Today, many sensor networks and their applications employ a brute force approach to collecting and analyzing sensor data. Such an approach often wastes valuable energy and computational resources by unnecessarily tasking sensors and generating observations of minimal use. People, on the other hand, have evolved sophisticated mechanisms to efficiently perceive their environment. One such mechanism includes the use of background knowledge to determine what aspects of the environment to focus our attention. In this paper, we develop an ontology of perception, IntellegO, that may be used to more efficiently convert observations into perceptions. IntellegO is derived from cognitive theory, encoded in set-theory, and provides a formal semantics of machine perception. We then present an implementation that iteratively and efficiently processes low level, heterogeneous sensor data into knowledge through use of the perception ontology and domain specific background knowledge. Finally, we evaluate IntellegO by collecting and analyzing observations of weather conditions on the Web, and show significant resource savings in the generation and storage of perceptual knowledge.

  • Comparative Trust Management with Applications: Bayesian Approaches Emphasis

    • Journal of Future Generation Computer Systems (FGCS)
    • 2013

    Trust relationships occur naturally in many diverse contexts such as collaborative systems, e-commerce, interpersonal interactions, social networks, and semantic sensor web. As agents providing content and services become increasingly removed from the agents that consume them, the issue of robust trust inference and update becomes critical. There is a need to find online substitutes for additional (direct or face-to-face) cues to derive measures of trust, and create efficient and robust systems for managing trust in order to support decision making. Unfortunately, there is neither a universal notion of trust that is applicable to all domains nor a clear explication of its semantics or computation in many situations. We motivate the trust problem, explain the relevant concepts, summarize research in modeling trust and gleaning trustworthiness, and discuss challenges
    confronting us. The goal is to provide a comprehensive broad overview of the trust landscape, with the nittygritties of a handful of approaches. We also provide details of the theoretical underpinnings and comparative analysis of Bayesian approaches to binary and multilevel trust, to automatically determine trustworthiness in a variety of reputation systems including those used in sensor networks, e-commerce, and collaborative environments.

    Keywords: Trust vs. reputation; Trust ontology; Gleaning trustworthiness; Trust metrics and models (propagation: chaining and aggregation); Social and sensor networks; Collaborative systems; Trust system attacks; Beta-PDF; Dirichlet distribution; Binary and multi-level trust

    Citation: Krishnaprasad Thirunarayan, Pramod Anantharam, Cory Henson, Amit Sheth
    Comparative trust management with applications: Bayesian approaches emphasis
    Future Generation Computer Systems, Volume 31, February 2014, Pages 182–199
    http://dx.doi.org/10.1016/j.future.2013.05.006

    Access: http://www.knoesis.org/library/resource.php?id=1875

  • Semantic Sensor Web

    • IEEE Internet Computing
    • July 2008

    In recent years, sensors have been increasingly adopted by a diverse array of disciplines, such as meteorology for weather forecasting and wildfire detection, civic planning for traffic management, satellite imaging for earth and space observation, medical sciences for patient care using biometric sensors, and homeland security for radiation and biochemical detection at ports. Sensors are thus distributed across the globe, leading to an avalanche of data about our environment. The rapid development and deployment of sensor technology involves many different types of sensors, both remote and in situ, with diverse capabilities such as range, modality, and maneuverability. Today, it's possible to use sensor networks to detect and identify a multitude of observations, from simple phenomena to complex events and situations. The lack of integration and communication between these networks, however, often isolates important data streams and intensifies the existing problem of too much data and not enough knowledge. With a view to addressing this problem, we discuss a semantic sensor Web (SSW) in which sensor data is annotated with semantic metadata to increase interoperability as well as provide contextual information essential for situational knowledge. In particular, this involves annotating sensor data with spatial, temporal, and thematic semantic metadata.

  • Physical-Cyber-Social Computing: An Early 21st Century Approach

    • IEEE Intelligent Systems
    • January 2013

    Visionaries and scientists from the early days of computing and electronic communication have discussed the proper role of technology to improve human experience. Technology now plays an increasingly important role in facilitating and improving personal and social activities and engagements, decision making, interaction with physical and social worlds, generating insights, and just about anything that a human, as an intelligent being, seeks to do. This article presents a vision of Physical-Cyber-Social (PCS) computing for a holistic treatment of data, information, and knowledge from physical, cyber, and social worlds to integrate, understand, correlate, and provide contextually relevant abstractions to humans.

  • Twitris: Socially Influenced Browsing

    • Semantic Web Challenge, International Semantic Web Conference 2009
    • November 2009

    First Author: Ashutosh Jadhav

    In this paper, we present Twitris, a semantic Web application that facilitates browsing for news and information, using social perceptions as the fulcrum. In doing so we address challenges in large scale crawling, processing of real time information, and preserving spatiotemporal-thematic properties central to observations pertaining to realtime events. We extract metadata about events from Twitter and bring related news and Wikipedia articles to the user. In developing Twitris,we have used the DBPedia ontology.

  • Discovering Fine-grained Sentiment in Suicide Notes

    • Journal of Biomedical Informatics Insights
    • January 2012

    This paper presents our solution for the i2b2 sentiment classification challenge. Our hybrid system consists of machine learning and rule-based classifiers. For the machine learning classifier, we investigate a variety of lexical, syntactic and knowledge-based features, and show how much these features contribute to the performance of the classifier through experiments. For the rule-based classifier, we propose an algorithm to automatically extract effective syntactic and lexical patterns from training examples. The experimental results show that the rule-based classifier outperforms the baseline machine learning classifier using unigram features. By combining the machine learning classifier and the rule-based classifier, the hybrid system gains a better trade-off between precision and recall, and yields the highest micro-averaged F-measure (0.5038), which is better than the mean (0.4875) and median (0.5027) micro-average F-measures among all participating teams.

  • An Up-to-date Knowledge-Based Literature Search and Exploration Framework for Focused Bioscience Domains

    • 2nd ACM SIGHIT Intl Health Informatics Symposium, IHI 2012
    • January 2012

    First Author: Ramakanth Kavuluru

    To handle the exponential growth in bioscience literature, several knowledge-based search systems that facilitate domain-specific search have been proposed. In such systems, knowledge of a domain of interest is embedded as a backbone that guides the search process. But the knowledge used in most such systems 1. exists only for few well known broad domains; 2. is of a basic nature: either purely hierarchical or involves only few relationship types; and 3. is not always kept up-to-date missing insights from recently published results. In this paper we present a framework and implementation of a focused and up-to-date knowledge-based search system, called Scooner, that utilizes domain-specific knowledge extracted from recent bioscience abstracts. To our knowledge, this is the first attempt in the field to address all three shortcomings mentioned above. Since recent introduction for operational use at Applied Biotechnology Branch of AFRL, some biologists are using Scooner on a regular basis, while it is being made available for use by many more. Initial evaluations point to the promise of the approach in addressing the challenge we set out to address.

  • Pattern-Based Synonym and Antonym Extraction

    • ACM Southeast Conference 2010, ACMSE2010
    • April 2010

    Many research studies adopt manually selected patterns for semantic relation extraction. However, manually identifying and discovering patterns is time consuming and it is difficult to discover all potential candidates. Instead, we propose an automatic pattern construction approach to extract verb synonyms and antonyms from English newspapers. Instead of relying on a single pattern, we combine results indicated by multiple patterns to maximize the recall.

  • What Goes Around Comes Around – Improving Linked Opend Data through On-Demand Model Creation

    • Web Science Conference 2010 - WebSci10
    • March 2010

    First Author: Christopher Thomas

    We present a method for growing the amount of knowledge available on the Web using a hermeneutic method that involves background knowledge, Information Extraction techniques and validation through discourse and use of the extracted information. We exemplify this using Linked Data as background knowledge, automatic Model/Ontology creation for the IE part and a Semantic Browser for evaluation. The hermeneutic approach, however, is open to be used with other IE techniques and other evaluation methods. We will present results from the model creation and anecdotal evidence for the feasibility of 'Validation through Use'.

  • What Kind of #Communication is Twitter? Mining #Psycholinguistic Cues for Emergency Coordination

    • Computers in Human Behavior Journal, Elsevier
    • July 2013

    The information overload created by social media messages in emergency situations challenges response organizations to find targeted content and users. We aim to select useful messages by detecting the presence of conversation as an indicator of coordinated citizen action. Using simple linguistic indicators associated with conversation analysis in social science, we model the presence of conversation in the communication landscape of Twitter in a large corpus of 1.5M tweets for various disaster and non-disaster events spanning different periods, lengths of time and varied social significance. Within Replies, Retweets and tweets that mention other Twitter users, we found that domain-independent, linguistic cues distinguish likely conversation from non-conversation in this online (mediated) communication. We demonstrate that conversation subsets within Replies, Retweets and tweets that mention other Twitter users potentially contain more information than non-conversation subsets. Information density also increases for tweets that are not Replies, Retweets or mentioning other Twitter users, as long as they reflect conversational properties. From a practical perspective, we have developed a model for trimming the candidate tweet corpus to identify a much smaller subset of data for submission to deeper, domain-dependent semantic analyses for the identification of actionable information nuggets for coordinated emergency response.

  • What Kind of #Communication is Twitter? Mining #Psycholinguistic Cues for Emergency Coordination

    • Computers in Human Behavior Journal, Elsevier
    • July 2013

    Information filtering model to reduce Twitter traffic for disaster coordination, where modeling of coordination is done via pyscholinguistic theories of conversations. Also, a proof for existence of similar human behavior of face-to-face communication in online (mediated) communication

  • Types of Property Pairs and Alignment on Linked Datasets – A Preliminary Analysis

    • Proceedings of the I-SEMANTICS 2013 Posters & Demonstrations Track co-located with 9th International Conference on Semantic Systems (I-SEMANTICS 2013) Graz, Austria, September 4-6, 2013.
    • August 2013

    Dataset publication on the Web has been greatly influenced by the
    Linked Open Data (LOD) project. Many interlinked datasets have become freely
    available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these interconnected
    datasets have received a lot of attention in the recent past compared to properties.
    We identify three different categories of property pairs found in the alignment
    process and study their relative distribution among well known LOD datasets.
    We also provide comparative analysis of state-of-the-art techniques with regard
    to different categories, highlighting their capabilities. This could lead to more
    realistic and useful alignment of properties in LOD and similar datasets.

  • Semantics Driven Approach for Knowledge Acquisition from EMRs

    • Journal of Biomedical and Health Informatics
    • 2013

    Semantic computing technologies have matured to be applicable to many critical domains such as national security, life sciences, and health care. However, the key to their success is the availability of a rich domain knowledge base. The creation and refinement of domain knowledge bases poses difficult challenges. The existing knowledge bases in the health care domain are rich in taxonomic relationships, but they lack non-taxonomic (domain) relationships. In this paper, we describe a semi-automatic technique for enriching existing domain knowledge bases with causal relationships gleaned from Electronic Medical Records (EMR) data. We determine missing causal relationships between domain concepts by validating domain knowledge against EMR data sources and leveraging semantic-based techniques to derive plausible relationships that can rectify knowledge gaps. Our evaluation demonstrates that semantic techniques can be employed to improve the efficiency of knowledge acquisition.

  • Semantics Driven Approach for Knowledge Acquisition from EMRs

    • Journal of Biomedical and Health Informatics
    • 2013

    Semantic computing technologies have matured to be applicable to many critical domains such as national security, life sciences, and health care. However, the key to their success is the availability of a rich domain knowledge base. The creation and refinement of domain knowledge bases poses difficult challenges. The existing knowledge bases in the health care domain are rich in taxonomic relationships, but they lack non-taxonomic (domain) relationships. In this paper, we describe a semi-automatic technique for enriching existing domain knowledge bases with causal relationships gleaned from Electronic Medical Records (EMR) data. We determine missing causal relationships between domain concepts by validating domain knowledge against EMR data sources and leveraging semantic-based techniques to derive plausible relationships that can rectify knowledge gaps. Our evaluation demonstrates that semantic techniques can be employed to improve the efficiency of knowledge acquisition.

  • Challenges in Understanding Clinical Notes: Why NLP Engines Fall Short and Where Background Knowledge Can Help

    • International Workshop on Data management & Analytics for healthcaRE co-located with ACM CIKM 2013
    • August 1, 2013

    Understanding of Electronic Medical Records(EMRs) plays a crucial role in improving healthcare outcomes. However, the unstructured nature of EMRs poses several technical challenges for structured information extraction from clinical notes leading to automatic analysis. Natural Language Processing(NLP) techniques developed to process EMRs are effective for variety of tasks, they often fail to preserve the semantics of original information expressed in EMRs, particularly in complex scenarios. This paper illustrates the complexity of the problems involved and deals with conflicts created due to the shortcomings of NLP techniques and demonstrates where domain specific knowledge bases can come to rescue in resolving conflicts that can significantly improve the semantic annotation and structured information extraction. We discuss various insights gained from our study on real world dataset.

  • Semantic Predications for Complex Information Needs in Biomedical Literature

    • International Conference on Bioinformatics and Biomedicine - BIBM 2011
    • November 2011

    Many complex information needs that arise in biomedical disciplines require exploring multiple documents in order to obtain information. While traditional information retrieval techniques that return a single ranked list of documents are quite common for such tasks, they may not always be adequate. The main issue is that ranked lists typically impose a significant burden on users to filter out irrelevant documents. Additionally, users must intuitively reformulate their search query when relevant documents have not been not highly ranked. Furthermore, even after interesting documents have been selected, very few mechanisms exist that enable document- to-document transitions. In this paper, we demonstrate the utility of assertions extracted from biomedical text (called semantic predications) to facilitate retrieving relevant documents for complex information needs. Our approach offers an alternative to query reformulation by establishing a framework for transitioning from one document to another. We evaluate this novel knowledge-driven approach using precision and recall metrics on the 2006 TREC Genomics Track.

  • Comparative Trust Management with Applications: Bayesian Approaches Emphasis

    • Journal of Future Generation Computer Systems (FGCS)
    • February 2014

    Trust relationships occur naturally in many diverse contexts such as collaborative systems, e-commerce, interpersonal interactions, social networks, and semantic sensor web. As agents providing content and services become increasingly removed from the agents that consume them, the issue of robust trust inference and update becomes critical. There is a need to find online substitutes for additional (direct or face-to-face) cues to derive measures of trust, and create efficient and robust systems for managing trust in order to support decision making. Unfortunately, there is neither a universal notion of trust that is applicable to all domains nor a clear explication of its semantics or computation in many situations. We motivate the trust problem, explain the relevant concepts, summarize research in modeling trust and gleaning trustworthiness, and discuss challenges
    confronting us. The goal is to provide a comprehensive broad overview of the trust landscape, with the nittygritties of a handful of approaches. We also provide details of the theoretical underpinnings and comparative analysis of Bayesian approaches to binary and multilevel trust, to automatically determine trustworthiness in a variety of reputation systems including those used in sensor networks, e-commerce, and collaborative environments. Ultimately, we need to develop expressive trust networks that can be assigned objective semantics.???

  • A statistical and schema independent approach to identify equivalent properties on linked data

    • 9th International Conference on Semantic Systems - ISEMANTICS 2013
    • September 2013

    Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts, instances, and properties. Efforts to address this limitation over properties have focused on matching data-type properties across datasets; however, matching of object-type properties has not received similar attention. We present an approach that can automatically match object-type properties across linked datasets, primarily exploiting and bootstrapping from entity co-reference links such as owl:sameAs. Our evaluation, using sample instance sets taken from Freebase, DBpedia, LinkedMDB, and DBLP datasets covering multiple domains shows that our approach matches properties with high precision and recall (on average, F measure gain of 57% - 78%).

  • Cursing in English on Twitter

    • ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14)
    • October 15, 2013

    Examine the characteristics of cursing activity on Twitter, involving the analysis of about 51 million tweets and about 14 million users. In particular, we explore a set of questions that have been recognized as crucial for understanding cursing in offline communications by prior studies, including the ubiquity, utility, and contextual dependencies of cursing.

  • Cursing in English on Twitter

    • ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW'14)
    • October 15, 2013

    Examine the characteristics of cursing activity on Twitter, involving the analysis of about 51 million tweets and about 14 million users. In particular, we explore a set of questions that have been recognized as crucial for understanding cursing in offline communications by prior studies, including the ubiquity, utility, and contextual dependencies of cursing.

  • A statistical and schema independent approach to identify equivalent properties on linked data

    • 9th International Conference on Semantic Systems - iSEMANTICS 2013
    • September 2013

    Linked Open Data (LOD) cloud has gained significant attention in the Semantic Web community recently. Currently it consists of approximately 295 interlinked datasets with over 50 billion triples including 500 million links, and continues to expand in size. This vast source of structured information has the potential to have a significant impact on knowledge-based applications. However, a key impediment to the use of LOD cloud is limited support for data integration tasks over concepts, instances, and properties. Efforts to address this limitation over properties have focused on matching data-type properties across datasets; however, matching of object-type properties has not received similar attention. We present an approach that can automatically match object-type properties across linked datasets, primarily exploiting and bootstrapping from entity co-reference links such as owl:sameAs. Our evaluation, using sample instance sets taken from Freebase, DBpedia, LinkedMDB, and DBLP datasets covering multiple domains shows that our approach matches properties with high precision and recall (on average, F measure gain of 57% - 78%).

  • Types of Property Pairs and Alignment on Linked Datasets – A Preliminary Analysis

    • Proceedings of the I-SEMANTICS 2013
    • August 2013

    Dataset publication on the Web has been greatly influenced by the
    Linked Open Data (LOD) project. Many interlinked datasets have become freely
    available on the Web creating a structured and distributed knowledge representation. Analysis and aligning of concepts and instances in these interconnected
    datasets have received a lot of attention in the recent past compared to properties.
    We identify three different categories of property pairs found in the alignment
    process and study their relative distribution among well known LOD datasets.
    We also provide comparative analysis of state-of-the-art techniques with regard
    to different categories, highlighting their capabilities. This could lead to more
    realistic and useful alignment of properties in LOD and similar datasets.

  • Automatic Domain Identification for Linked Open Data

    • IEEE/WIC/ACM International Conference on Web Intelligence
    • November 2013

    Linked Open Data (LOD) has emerged as one of the largest collections of interlinked structured datasets on the Web. Although the adoption of such datasets for applications is increasing, identifying relevant datasets for a specific task or topic is still challenging. As an initial step to make such identification easier, we provide an approach to automatically identify the topic domains of given datasets. Our method utilizes existing knowledge sources, more specifically Freebase, and we present an evaluation which validates the topic domains we can identify with our system. Furthermore, we evaluate the effectiveness of identified topic domains for the purpose of finding relevant datasets, thus showing that our approach improves reusability of LOD datasets.

  • A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi

    • PLoS Neglected Tropical Diseases
    • 2012

    Research on the biology of parasites requires a sophisticated and integrated computational platform to query and analyze large volumes of data, representing both unpublished (internal) and public (external) data sources. Effective analysis of an integrated data resource using knowledge discovery tools would significantly aid biologists in conducting their research, for example, through identifying various intervention targets in parasites and in deciding the future direction of ongoing as well as planned projects. A key challenge in achieving this objective is the heterogeneity between the internal lab data, usually stored as flat files, Excel spreadsheets or custom-built databases, and the external databases. Reconciling the different forms of heterogeneity and effectively integrating data from disparate sources is a nontrivial task for biologists and requires a dedicated informatics infrastructure. Thus, we developed an integrated environment using Semantic Web technologies that may provide biologists the tools for managing and analyzing their data, without the need for acquiring in-depth computer science knowledge.

    Citation: Priti P. Parikh, Todd A. Minning, Vinh Nguyen, Sarasi Lalithsena, Amir H. Asiaee, Satya S. Sahoo, Prashant Doshi, Rick Tarleton, and Amit P. Sheth. 'A Semantic Problem Solving Environment for Integrative Parasite Research: Identification of Intervention Targets for Trypanosoma cruzi.' PLoS Negl Trop Dis 6(1): e1458. doi:10.1371/journal.pntd.0001458, 2012. PMID: 22272365

  • GLYDE-an expressive XML standard for the representation of glycan structure.

    • Journal of Carbohydrate Research
    • 2005

    The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also 'understood' by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data provenance) to aid the interpretation and usage of glycan data, will facilitate the exchange of glycomics data across the scientific community. To meet this need, we introduce GLYcan Data Exchange (GLYDE) standard as an XML-based representation format to enable interoperability and exchange of glycomics data. An online tool () for the conversion of other representations to GLYDE format has been developed.

    Citation: Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, Cory Henson, and William S. York, 'GLYDE-An expressive XML standard for the representation of glycan structure,'Carbohydr Res, 340 (no. 18), December 30, 2005. pp. 2802-2807. Epub 2005 Oct 20. PMID: 16242678.

    Access: http://www.knoesis.org/library/resource.php?id=00018

  • From "glycosyltransferase" to "congenital muscular dystrophy": integrating knowledge from NCBI Entrez Gene and the Gene Ontology.

    • MedInfo (Stud Health Technol Inform.)
    • 2007

    Entrez Gene (EG), Online Mendelian Inheritance in Man (OMIM) and the Gene Ontology (GO) are three complementary knowledge resources that can be used to correlate genomic data with disease information. However, bridging between genotype and phenotype through these resources currently requires manual effort or the development of customized software. In this paper, we argue that integrating EG and GO provides a robust and flexible solution to this problem. We demonstrate how the Resource Description Framework (RDF) developed for the Semantic Web can be used to represent and integrate these resources and enable seamless access to them as a unified resource. We illustrate the effectiveness of our approach by answering a real-world biomedical query linking a specific molecular function, glycosyltransferase, to the disorder congenital muscular dystrophy.

    Citation: Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and Amit P. Sheth, 'From 'Glycosyltransferase'to 'Congenital Muscular Dystrophy': Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology,' in MEDINFO 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics, K.A. Kuhn, J.R. Warren, T.-Y. Leong (Eds.), Studies in Health Technology and Informatics, Vol. 129, Amsterdam: IOS, August 2007, pp. 1260-04. PMID: 17911917

    Access: http://www.knoesis.org/library/resource.php?id=00014

  • An ontology-driven semantic mashup of gene and biological pathway information: application to the domain of nicotine dependence.

    • Journal of Biomedical Informatics
    • 2008

    OBJECTIVES:

    This paper illustrates how Semantic Web technologies (especially RDF, OWL, and SPARQL) can support information integration and make it easy to create semantic mashups (semantically integrated resources). In the context of understanding the genetic basis of nicotine dependence, we integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base.

    METHODS:

    We use an ontology-driven approach to integrate two gene resources (Entrez Gene and HomoloGene) and three pathway resources (KEGG, Reactome and BioCyc), for five organisms, including humans. We created the Entrez Knowledge Model (EKoM), an information model in OWL for the gene resources, and integrated it with the extant BioPAX ontology designed for pathway resources. The integrated schema is populated with data from the pathway resources, publicly available in BioPAX-compatible format, and gene resources for which a population procedure was created. The SPARQL query language is used to formulate queries over the integrated knowledge base to answer the three biological queries.

    RESULTS:

    Simple SPARQL queries could easily identify hub genes, i.e., those genes whose gene products participate in many pathways or interact with many other gene products. The identification of the genes expressed in the brain turned out to be more difficult, due to the lack of a common identification scheme for proteins.

    CONCLUSION:

    Semantic Web technologies provide a valid framework for information integration in the life sciences. Ontology-driven integration represents a flexible, sustainable and extensible solution to the integration of large volumes of information. Additional resources, which enable the creation of mappings between information sources, are required to compensate for heterogeneity across namespaces.

    PMID: 18395495

    Access:
    http://www.knoesis.org/library/resource.php?id=00221

  • A unified framework for managing provenance information in translational research.

    • BMC Bioinformatics
    • November 2011

    BACKGROUND:

    A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists.

    RESULTS:
    We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata:(a) Provenance collection - during data generation(b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics(c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications(d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applications. We apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness.

    Access: http://www.knoesis.org/library/resource.php?id=1632

    PMID: 22126369 [Highly Accessed]

  • Provenance Context Entity (PaCE): Scalable Provenance Tracking for Scientific RDF Data

    • Proceeedings of 22nd International Scientific and Statistical Database Management Conference
    • July 2010

    The Resource Description Framework (RDF) format is being used by a large number of scientific applications to store and disseminate their datasets. The provenance information, describing the source or lineage of the datasets, is playing an increasingly significant role in ensuring data quality, computing trust value of the datasets, and ranking query results. Current provenance tracking approaches using the RDF reification vocabulary suffer from a number of known issues, including lack of formal semantics, use of blank nodes, and application-dependent interpretation of reified RDF triples. In this paper, we introduce a new approach called Provenance Context Entity (PaCE) that uses the notion of provenance context to create provenance-aware RDF triples. We also define the formal semantics of PaCE through a simple extension of the existing RDF(S) semantics that ensures compatibility of PaCE with existing Semantic Web tools and implementations. We have implemented the PaCE approach in the Biomedical Knowledge Repository (BKR) project at the US National Library of Medicine. The evaluations demonstrate a minimum of 49% reduction in total number of provenance-specific RDF triples generated using the PaCE approach as compared to RDF reification. In addition, performance for complex queries improves by three orders of magnitude and remains comparable to the RDF reification approach for simpler provenance queries.

    Access paper at: http://knoesis.org/library/resource.php?id=797

  • Semantic Provenance for eScience: Managing the Deluge of Scientific Data

    • IEEE Internet Computing
    • July 2008

    Provenance information in eScience is metadata that's critical to effectively manage the exponentially increasing volumes of scientific data from industrial-scale experiment protocols. Semantic provenance, based on domain-specific provenance ontologies, lets software applications unambiguously interpret data in the correct context. The semantic provenance framework for eScience data comprises expressive provenance information and domain-specific provenance ontologies and applies this information to data management. The authors' "two degrees of separation" approach advocates the creation of high-quality provenance information using specialized services. In contrast to workflow engines generating provenance information as a core functionality, the specialized provenance services are integrated into a scientific workflow on demand. This article describes an implementation of the semantic provenance framework for glycoproteomics.

    Satya S. Sahoo,Amit Sheth, and Cory Henson, 'Semantic Provenance for eScience: Managing the Deluge of Scientific Data', IEEE Internet Computing, vol. 12, no. 4, 2008, pp. 46-54.

    Access: http://www.knoesis.org/library/resource.php?id=00310

  • Knowledge modeling and its application in life sciences: a tale of two ontologies

    • Proceedings of the 15th international conference on World Wide Web
    • May 2006

    High throughput glycoproteomics, similar to genomics and proteomics, involves extremely large volumes of distributed, heterogeneous data as a basis for identification and quantification of a structurally diverse collection of biomolecules. The ability to share, compare, query for and most critically correlate datasets using the native biological relationships are some of the challenges being faced by glycobiology researchers. As a solution for these challenges, we are building a semantic structure, using a suite of ontologies, which supports management of data and information at each step of the experimental lifecycle. This framework will enable researchers to leverage the large scale of glycoproteomics data to their benefit.In this paper, we focus on the design of these biological ontology schemas with an emphasis on relationships between biological concepts, on the use of novel approaches to populate these complex ontologies including integrating extremely large datasets ( 500MB) as part of the instance base and on the evaluation of ontologies using OntoQA metrics. The application of these ontologies in providing informatics solutions, for high throughput glycoproteomics experimental domain, is also discussed. We present our experience as a use case of developing two ontologies in one domain, to be part of a set of use cases, which are used in the development of an emergent framework for building and deploying biological ontologies.

    Cite: Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, William York, and Samir Tartir, 'Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies,'15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006.

    Access: http://www.knoesis.org/library/resource.php?id=00020

  • GLYDE-an expressive XML standard for the representation of glycan structure.

    • Journal of Carbohydrate Research
    • 2005

    The amount of glycomics data being generated is rapidly increasing as a result of improvements in analytical and computational methods. Correlation and analysis of this large, distributed data set requires an extensible and flexible representational standard that is also 'understood' by a wide range of software applications. An XML-based data representation standard that faithfully captures essential structural details of a glycan moiety along with additional information (such as data provenance) to aid the interpretation and usage of glycan data, will facilitate the exchange of glycomics data across the scientific community. To meet this need, we introduce GLYcan Data Exchange (GLYDE) standard as an XML-based representation format to enable interoperability and exchange of glycomics data. An online tool for the conversion of other representations to GLYDE format has been developed.

    Keywords: GLYcan Data Exchange (GLYDE), Glycan data interoperability, XML-based glycan representation, Glycoinformatics

    Cite: Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, Cory Henson, and William S. York, 'GLYDE-An expressive XML standard for the representation of glycan structure,'Carbohydr Res, 340 (no. 18), December 30, 2005. pp. 2802-2807. Epub 2005 Oct 20. PMID: 16242678.

    Access: http://www.knoesis.org/library/resource.php?id=00018

  • A unified framework for managing provenance information in translational research.

    • BMC Bioinformatics
    • November 2011

    BACKGROUND:
    A critical aspect of the NIH Translational Research roadmap, which seeks to accelerate the delivery of "bench-side" discoveries to patient's "bedside," is the management of the provenance metadata that keeps track of the origin and history of data resources as they traverse the path from the bench to the bedside and back. A comprehensive provenance framework is essential for researchers to verify the quality of data, reproduce scientific results published in peer-reviewed literature, validate scientific process, and associate trust value with data and results. Traditional approaches to provenance management have focused on only partial sections of the translational research life cycle and they do not incorporate "domain semantics", which is essential to support domain-specific querying and analysis by scientists.
    RESULTS:
    We identify a common set of challenges in managing provenance information across the pre-publication and post-publication phases of data in the translational research lifecycle. We define the semantic provenance framework (SPF), underpinned by the Provenir upper-level provenance ontology, to address these challenges in the four stages of provenance metadata:(a) Provenance collection - during data generation(b) Provenance representation - to support interoperability, reasoning, and incorporate domain semantics(c) Provenance storage and propagation - to allow efficient storage and seamless propagation of provenance as the data is transferred across applications(d) Provenance query - to support queries with increasing complexity over large data size and also support knowledge discovery applicationsWe apply the SPF to two exemplar translational research projects, namely the Semantic Problem Solving Environment for Trypanosoma cruzi (T.cruzi SPSE) and the Biomedical Knowledge Repository (BKR) project, to demonstrate its effectiveness.

    Cite: doi:10.1186/1471-2105-12-461. PMID: 22126369 [Highly Accessed]

  • A Taxonomy-based Model for Expertise Extrapolation

    • 4th International Conference on Semantic Computing
    • September 2010

    While many ExpertFinder applications succeed in finding experts, their techniques are not always designed to capture the various levels at which expertise can be expressed. Indeed, expertise can be inferred from relationships between topics and subtopics in a taxonomy. The conventional wisdom is that expertise in subtopics is also indicative of expertise in higher level topics as well. The enrichment of Expertise Profiles for finding experts can therefore be facilitated by taking domain hierarchies into account. We present a novel semantics-based model for finding experts, expertise levels and collaboration levels in a peer review context, such as composing a Program Committee (PC) for a conference. The implicit coauthorship network encompassed by bibliographic data enables the possibility of discovering unknown experts within various degrees of separation in the coauthorship graph. Our results show an average of 85% recall in finding experts, when evaluated against three WWW Conference PCs and close to 80 additional comparable experts outside the immediate collaboration network of the PC Chairs.

  • Alignment Based Querying of Linked Open Data

    • Springer
    • September 10, 2012

    The Linked Open Data (LOD) cloud is rapidly becoming the largest interconnected source of structured data on diverse domains.The potential of the LOD cloud is enormous, ranging from solving challenging AI issues such as open domain question answering to automated knowledge discovery. However, due to an inherent distributed nature of LOD and a growing number of ontologies and vocabularies used in LOD datasets, querying over multiple datasets and retrieving LOD data remains a challenging task. In this paper, we propose a novel approach to querying linked data by using alignments for processing queries whose constituent data come from heterogeneous sources. We also report on our Alignment based Linked Open Data Querying System (ALOQUS) and present the architecture and associated methods. Using the state of the art alignment system BLOOMS, ALOQUS automatically maps concepts in users’ SPARQL queries, written in terms of a conceptual upper ontology or domain specific ontology, to different LOD concepts and datasets. It then creates a query plan, sends sub-queries to the different endpoints, crawls for co-referent URIs, merges the results and presents them to the user. We also compare the existing querying systems and demonstrate the added capabilities that the alignment based approach can provide for querying the Linked data.

    Citation: On the Move to Meaningful Internet Systems: OTM 2012
    Lecture Notes in Computer Science Volume 7566, 2012, pp 807-824

  • Don't like RDF Reification? Making Statements about Statements using Singleton Property

    • Proceedings of International Conference on World Wide Web (WWW2014)
    • 2014

    Statements about RDF statements, or meta triples, provide additional information about individual triples, such as the source, the occurring time or place, or the certainty. Integrating such meta triples into semantic knowledge bases would enable the querying and reasoning mechanisms to be aware of provenance, time, location, or certainty of triples. However, an efficient RDF representation for such meta knowledge of triples remains challenging. The existing standard reification approach allows such meta knowledge of RDF triples to be expressed using RDF by two steps. The first step is representing the triple by a Statement instance which has subject, predicate, and object indicated separately in three different triples. The second step is creating assertions about that instance as if it is a statement. While reification is simple and intuitive, this approach does not have formal semantics and is not commonly used in practice as described in the RDF Primer. In this paper, we propose a novel approach called Singleton Property for representing statements about statements and provide a formal semantics for it. We explain how this singleton property approach fits well with the existing syntax and formal semantics of RDF, and the syntax of SPARQL query language. We also demonstrate the use of singleton property in the representation and querying of meta knowledge in two examples of Semantic Web knowledge bases: YAGO2 and BKR. Our experiments on the BKR show that the singleton property approach give a decent performance in terms of number of triples, query length and query execution time compared to existing approaches. This approach, which is also simple and intuitive, can be easily adopted for representing and querying statements about statements in other knowledge bases.

  • Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citizen Roles for Crisis Response,

    • AAAI, ICWSM-13 Tutorials
    • July 11, 2013

    This tutorial weaves three themes and corresponding relevant topics- a.) citizen sensing and crisis mapping, b.) technical challenges and recent research for leveraging citizen sensing to improve crisis response coordination, and c.) experiences in building robust and scalable platforms/systems. It couples technical insights with identification of computational techniques and algorithms along with real-world examples.

  • Twitris v3: From Citizen Sensing to Analysis, Coordination and Action

    • AAAI, ICWSM-13
    • July 8, 2013

    System to leverage social media analytics beyond computing the obvious, instead focus on targeted action oriented computing to assist macro level phenomenon of coordination and decision making

    [System Demonstration Paper]

  • What Kind of #Communication is Twitter? Mining #Psycholinguistic Cues for Emergency Coordination

    • Computers in Human Behavior Journal, Elsevier
    • July 2013

    Information filtering model to reduce Twitter traffic for disaster coordination, where modeling of coordination is done via pyscholinguistic theories of conversations. Also, a proof for existence of similar human behavior of face-to-face communication in online (mediated) communication

  • Understanding User-Community Engagement by Multi-faceted Features: A Case Study on Twitter

    • SoME-2011 Workshop on Social Media Engagement, in conjunction with WWW-2011
    • March 30, 2011

    The widespread use of social networking websites in recent years has suggested a need for effective methods to understand the new forms of user engagement, the factors impacting them, and the fundamental reasons for such engagements. We perform exploratory analysis on Twitter to understand the dynamics of user engagement by studying what attracts a user to participate in discussions on a topic. We identify various factors which might affect user engagement, ranging from content properties, network topology to user characteristics on the social network, and use them to predict user joining behavior. As opposed to traditional ways of studying them separately, these factors are organized in our framework, People-Content-Network Analysis (PCNA), mainly designed to enable understanding of human social dynamics on the web. We perform experiments on various Twitter user communities formed around topics from diverse domains, with varied social significance, duration and spread. Our findings suggest that capabilities of content, user and network features vary greatly, motivating the incorporation of all the factors in user engagement analysis, and hence, a strong need can be felt to study dynamics of user engagement by using the PCNA framework. Our study also reveals certain correlation between types of event for discussion topics and impact of user engagement factors.

  • Twitris- a System for Collective Social Intelligence

    • Encyclopedia of Social Network Analysis and Mining (ESNAM), Springer
    • 2014

    Twitris, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris addresses challenges in large scale processing of social data, preserving spatio-temporal-thematic properties and focusing on multi-dimensional analysis of sptatio-temporal-thematic, people-content-network and sentiment-emotion-subjectivity facets. Twitris also covers context based semantic integration of multiple Web resources and expose semantically enriched social data to the public domain. Semantic Web technologies enable the system's integration and analysis abilities. It has applications for studying and analyzing social sensing and perception of a broad variety of events: politics and elections, social movements and uprisings, crisis and disasters, entertainment, environment, decision making and coordination, brand management, campaign effectiveness, etc.

  • Emergency-relief coordination on social media

    • First Monday
    • January 2014


    Disaster affected communities are increasingly turning to social media for communication and coordination. This includes reports on needs (demands) and offers (supplies) of resources required during emergency situations. Identifying and matching such requests with potential responders can substantially accelerate emergency relief efforts. Current work of disaster management agencies is labor intensive, and there is substantial interest in automated tools.

    We present machine–learning methods to automatically identify and match needs and offers communicated via social media for items and services such as shelter, money, clothing, etc. For instance, a message such as “we are coordinating a clothing/food drive for families affected by Hurricane Sandy. If you would like to donate, DM us” can be matched with a message such as “I got a bunch of clothes I’d like to donate to hurricane sandy victims. Anyone know where/how I can do that?” Compared to traditional search, our results can significantly improve the matchmaking efforts of disaster response agencies.

  • Characterizing concepts of interest leveraging Linked Data and the Social Web

    • Web Intelligence 2013
    • November 2013

    Extracting and representing user interests on the Social Web is becoming an essential part of the Web for personalisation and recommendations. Such personalisation is required in order to provide an adaptive Web to users, where content fits their preferences, background and current interests, making the Web more social and relevant. Current techniques analyse user activities on social media systems and collect structured or unstructured sets of entities representing users' interests. These sets of entities, or user profiles of interest, are often missing the semantics of the entities in terms of: (i) popularity and temporal dynamics of the interests on the Social Web and (ii) abstractness of the entities in the real world. State of the art techniques to compute these values are using specific knowledge bases or taxonomies and need to analyse the dynamics of the entities over a period of time. Hence, we propose a real-time, computationally inexpensive, domain independent model for concepts of interest composed of: popularity, temporal dynamics and specificity. We describe and evaluate a novel algorithm for computing specificity leveraging the semantics of Linked Data and evaluate the impact of our model on user profiles of interests.

    Citation:
    Fabrizio Orlandi, Pavan Kapanipathi, Amit Sheth, Alexandre Passant, "Characterising Concepts of Interest Leveraging Linked Data and the Social Web," IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2013 pp. 519-526.

  • Advancing data reuse in phyloinformatics using an ontology-driven Semantic Web approach

    • BMC Medical Genomics
    • November 11, 2013

    Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena, including, for example, the importance of gene and genome duplications in the evolution of gene function, the role of adaptation as a driver of diversification, or the evolutionary consequences of biogeographic shifts. Phyloinformaticists are developing data standards, databases and communication protocols (e.g. Application Programming Interfaces, APIs) to extend the accessibility of gene trees, species trees, and the metadata necessary to interpret these trees, thus enabling researchers across the life sciences to reuse phylogenetic knowledge. Specifically, Semantic Web technologies are being developed to make phylogenetic knowledge interpretable by web agents, thereby enabling intelligently automated, high-throughput reuse of results generated by phylogenetic research. This manuscript describes an ontology-driven, semantic problem-solving environment for phylogenetic analyses and introduces artefacts that can promote phyloinformatic efforts to promote accessibility of trees and underlying metadata. PhylOnt is an extensible ontology with concepts describing tree types and tree building methodologies including estimation methods, models and programs. In addition we present the PhylAnt platform for annotating scientific articles and NeXML files with PhylOnt concepts. The novelty of this work is the annotation of NeXML files and phylogenetic related documents with PhylOnt Ontology. This approach advances data reuse in phyloinformatics.

  • PhylOnt : A Domain-Specic Ontology for Phylogeny Analysis

    • IEEE International Conference on Bioinformatics and Biomedicine
    • October 2012

    Phylogenetic analyses can resolve historical relationships among genes, organisms or higher taxa. Understanding such relationships can elucidate a wide range of biological phenomena including the role of adaptation as a driver of diversification, the importance of gene and genome duplications in the evolution gene function, or the evolutionary consequences of biogeographic shifts. The variety of methods of analysis and data types typically employed in phylogenetic analyses can pose challenges for semantic reasoning due to significant representational and computational complexity. These challenges could be ameliorated with the development of an ontology designed to capture and organize the variety of concepts used to describe phylogenetic data, methods of analysis and the results of phylogenetic analyses. In this paper, we discuss the development of PhylOnt - an ontology for phylogenetic analyses, which establishes a foundation for semantics-based workflows including meta-analyses of phylogentic data and trees. PhylOnt is an extensible ontology, which describes the methods employed to estimate trees given a data matrix, models and programs used for phylogenetic analysis and descriptions of phylogenetic trees including branch-length information and support values. The relational vocabulary included in PhylOnt will facilitate the integration of heterogeneous data types derived from both structured and unstructured sources. To illustrate the utility of PhylOnt, we annotated scientific literature to support semantic search. The semantic annotations can subsequently support workflows that requiring the exchange and integration of heterogeneous phylogenetic information.

    Panahiazar, M.; Ranabahu, A.; Taslimi, V.; Yalamanchili, H.; Stoltzfus, A.; Leebens-Mack, J.; Sheth, A.P., "PhylOnt: A domain-specific ontology for phylogeny analysis," 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 4-7 Oct. 2012, pp.1-6.
    doi: 10.1109/BIBM.2012.6392677

  • User Interests Identification on Twitter Using a Hierarchical Knowledge Base

    • Extended Semantic Web Conference 2014 (To Appear)
    • February 2014

    Industry and researchers have identified numerous ways to monetize microblogs for personalization and recommendation. A common challenge across these different works is identification of user interests. Although techniques have been developed to address this challenge, a flexible approach that spans multiple levels of granularity in user interests has not been forthcoming.

    In this work, we focus on exploiting hierarchical semantics of concepts to infer richer user interests expressed as Hierarchical Interest Graph. To create such graphs, we utilize user's Twitter data to first ground potential user interests to structured background knowledge such as Wikipedia Category Graph. We then use an adaptation of spreading activation theory to assign user interest score (or weights) to each category in the hierarchy. The Hierarchical Interest Graph not only comprises of user's explicitly mentioned interests determined from Twitter, but also their implicit interest categories inferred from the background knowledge source. We demonstrate the effectiveness of our approach through a user study which shows an average of approximately eight of the top ten weighted categories in the graph being relevant to a given user's interests.

  • User Interests Identification on Twitter Using a Hierarchical Knowledge Base

    • Extended Semantic Web Conference 2014 (To Appear)
    • February 2014

    Twitter, due to its massive growth as a social networking platform has been in focus to analyze its user generated content for personalization and recommendation tasks. A common challenge across these tasks is identifying user interests from tweets. Lately, semantic enrichment of Twitter posts to determine (entity-based) user interests has been an active area of research. The advantages of these approaches include interoperability, information reuse and the availability of knowledge-bases to be exploited. However, exploiting these knowledge bases for identifying user interests still remains a challenge. In this work, we focus on exploiting hierarchical relationships present in knowledge-bases to infer richer user interests expressed as Hierarchical Interest Graph. We argue that the hierarchical semantics of concepts can enhance the existing systems to personalize or recommend items based on varied level of conceptual abstractness. We demonstrate the effectiveness of our approach through a user study which shows an average of approximately eight of the top ten weighted hierarchical interests in the graph being relevant to a given user's interests.

  • Alignment and dataset identification of linked data in Semantic Web

    • Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
    • March 2014

    The Linked Open Data (LOD) cloud has gained significant attention in the
    Semantic Web community over the past few years. With rapid expansion in
    size and diversity, it consists of over 800 interlinked datasets with over 60 billion
    triples. These datasets encapsulate structured data and knowledge spanning over
    varied domains such as entertainment, life sciences, publications, geography,
    and government. Applications can take advantage of this by using the knowledge
    distributed over the interconnected datasets, which is not realistic to find in a single
    place elsewhere. However, two of the key obstacles in using the LOD cloud are the
    limited support for data integration tasks over concepts, instances, and properties,
    and relevant data source selection for querying over multiple datasets. We review,
    in brief, some of the important and interesting technical approaches found in
    the literature that address these two issues. We observe that the general purpose
    alignment techniques developed outside the LOD context fall short in meeting the
    heterogeneous data representation of LOD. Therefore, an LOD-specific review of
    these techniques (especially for alignment) is important to the community. The
    topics covered and discussed in this article fall under two broad categories, namely
    alignment techniques for LOD datasets and relevant data source selection in the
    context of query processing over LOD datasets.

  • On Understanding Divergence of Online Social Group Discussion

    • AAAI, ICWSM-14
    • 2014

    We study online social group dynamics based on how group members diverge in their online discussions. Previous studies mostly focused on the link structures to characterize social group dynamics, whereas the group behavior of content generation in discussions is not well understood. Particularly, we use Jensen-Shannon (JS) divergence to measure the divergence of topics in user-generated contents, and how it progresses over time. We study Twitter messages (tweets) in multiple real-world events (natural disasters and social activism) with different times and demographics. We also model structural and user features with guidance from two socio-psychological theories, social cohesion and social identity, to learn their implications on group discussion divergence. Those features show significant correlation with group discussion divergence. By leveraging them we are able to construct a classifier to predict the future increase or decrease in group discussion divergence, which achieves an area under the curve (AUC) of 0.84 and an F-1 score (harmonic mean of precision and recall) of 0.8. Our approach allows to systematically study collective diverging group behavior independent of group formation design. It can help to prioritize whom to engage with in communities for specific topics of needs during disaster response coordination, and for specific concerns and advocacy in the brand management.

    Citation: Hemant Purohit, Yiye Ruan, Dave Fuhry, Srinivasan Parthasarathy, Amit Sheth. On Understanding Divergence of Online Social Group Discussion. In 8th Int'l AAAI Conference on Weblogs and Social Media (ICWSM 2014), June 2014

  • Emergency-relief coordination on social media

    • First Monday
    • January 2014

    Disaster affected communities are increasingly turning to social media for communication and coordination. This includes reports on needs (demands) and offers (supplies) of resources required during emergency situations. Identifying and matching such requests with potential responders can substantially accelerate emergency relief efforts. Current work of disaster management agencies is labor intensive, and there is substantial interest in automated tools.

    We present machine–learning methods to automatically identify and match needs and offers communicated via social media for items and services such as shelter, money, clothing, etc. For instance, a message such as “we are coordinating a clothing/food drive for families affected by Hurricane Sandy. If you would like to donate, DM us” can be matched with a message such as “I got a bunch of clothes I’d like to donate to hurricane sandy victims. Anyone know where/how I can do that?” Compared to traditional search, our results can significantly improve the matchmaking efforts of disaster response agencies.

  • With Whom to Coordinate, Why and How in Ad-hoc Social Media Communities during Crisis Response

    • ISCRAM-14
    • 2014

    During crises affected people, well-wishers, and observers join social media communities to discuss the event while sharing useful information relevant to response coordination, for example, specific resource needs. But it is difficult to identify and engage with such users, our framework enables such coordination assistive engagement.

  • YouRank: Let User Engagement Rank Microblog Search Results

    • The Eighth International AAAI Conference on Weblogs and Social Media (ICWSM 2014)
    • March 2014

    We propose an approach for ranking microblog search results. The basic idea is to leverage user engagement for the purpose of ranking: if a microblog post received many retweets/replies, this means users find it important and it should be ranked higher. However, simply applying the raw count of engagement may bias the ranking by favoring posts from celebrity users whose posts generally receive a disproportionate amount of engagement regardless of the contents of posts. To reduce this bias, we propose a variety of time window-based outlier features that transfer the raw engagement count into an importance score, on a per user basis. The evaluation on five real-world datasets confirms that the proposed approach can be used to improve microblog search.

  • An Information Filtering and Management Model for Twitter Traffic to Assist Crisis Response Coordination

    • Journal of CSCW, Springer
    • 2014

    Model for filtering information by using psycholinguistics theories to identify tacit cooperation in the declarations of resource needs and availability during disaster response on social media. Also, a domain ontology to create an annotated information repository for supporting the varying abstract presentation of organized, actionable information nuggets regarding resource needs and availability in visual interfaces, as well as complex querying ability for who-what-where in coordination.

  • An Information Filtering and Management Model for Twitter Traffic to Assist Crisis Response Coordination

    • Journal of CSCW, Springer
    • 2014

    Model for filtering information by using psycholinguistics theories to identify tacit cooperation in the declarations of resource needs and availability during disaster response on social media. Also, a domain ontology to create an annotated information repository for supporting the varying abstract presentation of organized, actionable information nuggets regarding resource needs and availability in visual interfaces, as well as complex querying ability for who-what-where in coordination.

  • Active Learning with Efficient Feature Weighting Methods for Improving Data Quality and Classification Accuracy

    • In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL'14)
    • March 6, 2014

    Many machine learning datasets are noisy with a substantial number of mislabeled instances. This noise yields sub-optimal classification performance. In this paper we study a large, low quality annotated dataset, created quickly and cheaply using Amazon Mechanical Turk to crowdsource annotations. We describe computationally cheap feature weighting techniques and a novel non-linear distribution spreading algorithm that can be used to iteratively and interactively correcting mislabeled instances to significantly improve annotation quality at low cost. Eight different emotion extraction experiments on Twitter data demonstrate that our approach is just as effective as more computationally expensive techniques. Our techniques save a considerable amount of time.

  • Service level agreement in cloud computing

    • OOPSLA
    • 2009

    Cloud computing that provides cheap and pay-as-you-go computing resources is rapidly gaining momentum as an alternative to traditional IT Infrastructure. As more and more consumers delegate their tasks to cloud providers, Service Level Agreements(SLA) between consumers and providers emerge as a key aspect. Due to the dynamic nature of the cloud, continuous monitoring on Quality of Service (QoS) attributes is necessary to enforce SLAs. Also numerous other factors such as trust (on the cloud provider) come into consideration, particularly for enterprise customers that may outsource its critical data. This complex nature of the cloud landscape warrants a sophisticated means of managing SLAs. This paper proposes a mechanism for managing SLAs in a cloud computing environment using the Web Service Level Agreement (WSLA) framework, developed for SLA monitoring and SLA enforcement in a Service Oriented Architecture (SOA). We use the third
    party support feature of WSLA to delegate monitoring and enforcement tasks to other entities in order to solve the trust issues. We also present a real world use case to validate our proposal.

  • What Information about Cardiovascular Diseases do People Search Online?

    • 25th European Medical Informatics Conference (MIE 2014), Istanbul, Turkey
    • 2014

    In this work, we performed categorization of cardiovascular disease (CVD) related search queries into “consumer-oriented” health categories to study what health topics users search for CVD. This study provides useful insights for online health information seeking and information needs in chronic diseases and particularly in CVD.

  • Analysis of Online Information Searching for Cardiovascular Diseases on a Consumer Health Information Portal

    • AMIA Annual Symposium
    • 2014

    Since the early 2000’s, Internet usage for health information searching has increased significantly. Studying search queries can help us to understand users “information need” and how do they formulate search queries (“expression of information need”). Although cardiovascular diseases (CVD) affect a large percentage of the population, few studies have investigated how and what users search for CVD. We address this knowledge gap in the community by analyzing a large corpus of 10 million CVD related search queries from MayoClinic.com. Using UMLS MetaMap and UMLS semantic types/concepts, we developed a rule-based approach to categorize the queries into 14 health categories. We analyzed structural properties, types (keyword-based/Wh-questions/Yes-No questions) and linguistic structure of the queries. Our results show that the most searched health categories are ‘Diseases/Conditions’, ‘Vital-Sings’, ‘Symptoms’ and ‘Living-with’. CVD queries are longer and are predominantly keyword-based. This study extends our knowledge about online health information searching and provides useful insights for Web search engines and health websites.

  • Comparative Analysis of Online Health Queries Originating from Personal Computers and Smart Devices on a Consumer Health Information Portal

    • Journal of Medical Internet Research (Impact factor 3.8)
    • May 31, 2014

    The number of people using the Internet and usage of smart devices for health information seeking are increasing rapidly. In this study, we analyzed how device choice (desktops/laptops vs smartphones/tablets) impacts the online health information seeking.

Amit Sheth's Education

The Ohio State University

PhD, Computer & Information Science

19831985

The Ohio State University

MS, Computer & Information Science

19811983

Birla Institute of Technology and Science

BE (Hons), Electrical and Electronics Engineering

19761981

A.G. High School

SSC, High School

19681975

Amit Sheth's Additional Information

Websites:
Groups and Associations:
Honors and Awards:

LexisNexis Ohio Eminent Scholar. IEEE Fellow. Wright State University Trustee Award (highest for the univ). IBM Faculty Award. UGA Career Center Recognition/Award for "greatly contributing to the career development of students."

Contact Amit for:

View Amit Sheth’s full profile to...

  • See who you and Amit Sheth know in common
  • Get introduced to Amit Sheth
  • Contact Amit Sheth directly

View Amit's full profile

Not the Amit Sheth you were looking for? View more »

Viewers of this profile also viewed...