Data mining and information retrieval
Washington D.C. Metro Area
Data mining and information retrieval
Washington D.C. Metro Area
• Industrial experience in software development for text mining and information retrieval.
• Ph.D. in Computer Science (Algorithm development for distributed data mining )
• 5+ years of academic research experience in data mining and knowledge management
• MS in Computer Science (Privacy preserving data mining )
• Industry experience at IBM (both research lab and software development)
• 1 patent (filed)
• 10+ accredited publications, including first author IEEE transaction paper and winner of IEEE ‘Best Research Paper’ award.
Text mining, data mining, algorithm development, knowledge management, search and information retrieval, distributed computing, web mining, peer-to-peer networks, privacy preserving data mining.
(Information Technology and Services industry)
December 2008 — Present (8 months)
Software design and development for intelligent and automated document classification. Knowledge management and information processing.
(Information Technology and Services industry)
July 2007 — November 2008 (1 year 5 months)
* Software product management in web content discovery and search enhancement
* Software product design and development on adaptive query parser and spell-checker
* Algorithm design for search relevance ranking in local-search.
* Software design and development on search log analysis.
* Research and prototype development in data mining, text mining, knowledge management in local search and enterprise search.
(Educational Institution; 1001-5000 employees; Higher Education industry)
September 2004 — July 2007 (2 years 11 months)
Full-time PhD student and graduate research assistant in distributed data mining and knowledge discovery (DIADIC) laboratory.
Research Highlights:
* Development of random-walk based algorithm for uniform sampling of data distributed in a peer-to-peer(P2P) network, modeling it as a Markov chain. Finding optimized random-walk length to minimize communication cost. Application of uniform sampling to mine frequent itemsets in a P2P network.
* Clustering in peer-to-peer network. Development of the first-ever K-means clustering algorithm for horizontally partitioned data in a peer-to-peer network that can achieve global clustering without exchange of any data and network-wise synchronization and work in a dynamic network with changing data and topology.
(Public Company; 10,001 or more employees; IBM; Information Technology and Services industry)
May 2005 — August 2005 (4 months)
* Research on preserving individual customer privacy in customer service e-mail data.
* Proposed detection and replacement of privacy sensitive information in unstructured text maintaining text readability to provide extra level of privacy.
* Research solution filed for patent.
(Information Technology and Services industry)
2005 — 2005 (less than a year)
(Educational Institution; 1001-5000 employees; Higher Education industry)
August 2002 — August 2004 (2 years 1 month)
Full time graduate student in Computer Science and graduate research assistant at DIADIC laboratory. Research in privacy preserving data mining algorithm and data privacy and security.
Research Highlights:
* Evaluation of random additive perturbation in privacy preserving data mining.
Award winning research in evaluation of additive random perturbation technique as a privacy-preserving data mining tool. Developed spectral filtering technique to estimate original data from perturbed data with high accuracy. First ever research that proved vulnerability of additive random perturbation technique in privacy preserving data mining.
* Data filtering in spectral domain.
Exploiting properties of eigenvalues of random noise to extract numerical and transaction data from perturbed data in spectral domain.
(Public Company; 10,001 or more employees; Computer Software industry)
April 2002 — July 2002 (4 months)
ERP/CRM software solution development. Software tool used: SAP and ABAP.
(Government Agency; 1001-5000 employees; Research industry)
September 2001 — March 2002 (7 months)
Simulation of virtual laboratory for remote technical education.Software tool used: LabVIEW.
PhD , Computer Science , 2004 — 2008
* Dissertation topic: Approximate Distributed Algorithms for Mining Data in Peer-to-Peer Networks
* Recipient of PhD Fellowship, University of Maryland, Baltimore County, 2007.
M.S. , Computer Science , 2002 — 2004
Dissertation topic: On Random Additive Perturbation for Privacy Preserving Data Mining
* GPA: 3.7/4.00
* MS research topic won the best research paper award in 2003 IEEE International Conference on Data Mining, Melbourne, Florida.
Bachelors , Electrical Engineering , 1997 — 2001
* GPA : 3.85/4.00
* Jadavpur University Alumni Association Annual Award holder for 2001.
* Ranked within top 10% of a class of 100.
Industrial research, collaboration.
Recipient of Best Research Paper Award as lead student author in IEEE International Conference on Data Mining (Nov,2003).