Data mining and information retrieval
Washington D.C. Metro Area
Data mining and information retrieval
Washington D.C. Metro Area
• Industrial experience in software development for text mining and information retrieval.
• Ph.D. in Computer Science (Algorithm development for distributed data mining )
• 5+ years of academic research experience in data mining and knowledge management
• Startup experience
• Strong mathematical and analytical skills with business application in online search, advertisement, user-personalization, automated document processing.
• MS in Computer Science (Privacy preserving data mining )
• Industry experience at IBM (both research lab and software development)
• 1 patent (filed)
• 12+ accredited publications, including first author IEEE transaction paper and winner of IEEE ‘Best Research Paper’ award.
Text mining, data mining, algorithm development, knowledge management, search and information retrieval, natural language processing, machine learning, distributed computing, web mining, peer-to-peer networks, privacy preserving data mining.
(Information Technology and Services industry)
December 2008 — Present (1 year )
* Software design and development in intelligent information extraction and knowledge management.
- Application of pattern recognition and natural language processing techniques for capturing information from unstructured data and derive meaning out of it.
* Algorithm development for automated and intelligent document classification based on the content.
- Development of machine learning algorithm for self-training software for document classification.
* Software product development for faceted search and information retrieval.
- Application development for presenting categorical search results and dynamic clustering of search results.
* Research and development in document clustering.
(Information Technology and Services industry)
July 2007 — November 2008 (1 year 5 months)
* Software product management in web content discovery and search enhancement
* Software product design and development on adaptive query parser and spell-checker
* Algorithm design for search relevance ranking in local-search.
* Software design and development on search log analysis.
* Research and prototype development in data mining, text mining, knowledge management in local search and enterprise search.
(Educational Institution; Higher Education industry)
September 2004 — July 2007 (2 years 11 months)
* Development of random-walk based algorithm for uniform sampling of data distributed in a peer-to-peer(P2P) network, modeling it as a Markov chain. Finding optimized random-walk length to minimize communication cost. Application of uniform sampling to mine frequent itemsets in a P2P network.
* Clustering in peer-to-peer network. Development of the first-ever K-means clustering algorithm for horizontally partitioned data in a peer-to-peer network that can achieve global clustering without exchange of any data and network-wise synchronization and work in a dynamic network with changing data and topology.
(Public Company; 10,001 or more employees; IBM; Information Technology and Services industry)
May 2005 — August 2005 (4 months)
* Research on preserving individual customer privacy in customer service e-mail data.
* Proposed detection and replacement of privacy sensitive information in unstructured text maintaining text readability to provide extra level of privacy.
* Research solution filed for patent.
(Educational Institution; Higher Education industry)
August 2002 — August 2004 (2 years 1 month)
* Evaluation of random additive perturbation in privacy preserving data mining.
Award winning research in evaluation of additive random perturbation technique as a privacy-preserving data mining tool. Developed spectral filtering technique to estimate original data from perturbed data with high accuracy. First ever research that proved vulnerability of additive random perturbation technique in privacy preserving data mining.
(Public Company; 10,001 or more employees; Computer Software industry)
April 2002 — July 2002 (4 months)
ERP/CRM software solution development. Software tool used: SAP and ABAP.
PhD , Computer Science , 2004 — 2008
* Dissertation topic: Approximate Distributed Algorithms for Mining Data in Peer-to-Peer Networks
* Recipient of PhD Fellowship, University of Maryland, Baltimore County, 2007.
M.S. , Computer Science , 2002 — 2004
Dissertation topic: On Random Additive Perturbation for Privacy Preserving Data Mining
* GPA: 3.7/4.00
* MS research topic won the best research paper award in 2003 IEEE International Conference on Data Mining, Melbourne, Florida.
Bachelors , Electrical Engineering , 1997 — 2001
* GPA : 3.85/4.00
* Jadavpur University Alumni Association Annual Award holder for 2001.
* Ranked within top 10% of a class of 100.
industrial research, collaboration.
IEEE
Recipient of Best Research Paper Award as lead student author in IEEE International Conference on Data Mining (Nov,2003).