Nova Southeastern University Office of Academic Affairs Search NSU Site Map Nova Southeastern University
President's Faculty R & D Grant 
Committees, Councils
  and Boards
Faculty Policy Manual
NSU Scholarly Journals
Professional Journals
Prof. Memberships
Academic Policies & Procedures
Provost's Research and Scholarship Award
President's Faculty
 R & D Grant
PFRDG Application Review Process by NSU Librarians
Office of Academic Quality, Assessment, and Accreditation
Contact Us

Print this page  


With a focus on learning, we employ a range of strategies to support innovation, collaboration across centers, and university-wide discussion and decision-making


Eighth Annual Grant Winners 2007-2008

Ahmed Albatineh, Ph.D. - FCAS

Donald Rosenblum, Dean - FCAS

Title: Correction of Jaccard Similarity Index for Chance Agreement in Cluster Analysis


Grant Winners 2007-2008

Cluster analysis is the art uncovering structure in data sets using clustering algorithms. Most of the time we are interested in measuring similarity between two groupings of the same data set using similarity indices. Such indices are widely used in many disciplines including gene expression and micro array analysis, marketing behavioral research, ecology, botany just to name a few. The problem with such indices is that they do not account for agreement due to chance between the two groupings of the same data set. In this study, I will derive new mathematical procedure to correct the Jaccard similarity index for chance agreement, which will improve substantially the performance of this index in terms of cluster structure recovery and validation studies. Jaccard index was introduced in 1908 to measure the degree of relatedness between two biological communities with respect to their species composition and is widely used in ecology and botany as well. I think that the results of this study will be of great importance for all colleagues working in the areas of ecology, botany, biology, gene expression and micro array data analysis and any other field where cluster analysis and measuring similarity is of interest.