Deduplication on Large Family Datasets