NUS Home  |  myEmail  |   Search:
Back to NUS homepage

  

Resources

Gbrowser
Bulk Data
HLA Data
Software

Software


SINGAPORE GENOME VARIATION PROJECT

ABOUT OUR PROJECT

This study aims to characterize the extent of common variation in the human genome across at least 1 million single nucleotide polymorphisms (SNPs) for DNA samples from each of the three ethnic groups in Singapore – Chinese, Malays and Indians. The data generated will supplement the public database of genetic variation provided by the International HapMap Project which surveyed individuals from four populations across Africa, Europe and East Asia.  The data will be used to assess the difference in the extent of linkage disequilibrium between the different ethnic groups, evaluate genetic heterogeneity in sample collections, document broad-scale recombination hotspots and map the extent of copy number variations in each ethnic group. The results of the analysis will be used to guide and optimize the design of large-scale genetic association studies, as well as for investigating gene-environment interactions. Knowledge of the degree of genetic commonality across ethnic groups will also provide preliminary indication of whether genes involved in drug and enzyme metabolism are common across the ethnic groups.

Genotyping platforms used are:

  1. The Affymetrix Genome-Wide Human SNP Array 6.0 which assays approximately 900,000 SNPs and more than 946,000 genetic markers probing for copy number variations
  2. The Illumina Human1M single BeadChip which assays about 1 million SNPs and copy number polymorphisms.

Data from both platforms have been merged for this release.

A. SGVP samples

The SGVP samples comprise of 292 samples – 99 Chinese, 98 Malays and 95 Indians. The inclusion criterion specifies that parents and both sets of grandparents have to belong to the same ethnic group.

Population label

Population Number of samples

CHS

Chinese 99
MAS Malays 98
INS Indians 95

 

 

 

B. SNP Genotype data

a. Genotype calling

Illumina genotypes were assigned by the proprietary calling algorithm GenCall (GC) in BeadStudio 3.0 using the cluster files provided by Illumina. A threshold of 0.15 was implemented on the GC score to decide on the confidence of the assigned genotypes, i.e. any genotype with a GC score ≥ 0.15 will be accepted and assigned NULL otherwise.

Affymetrix genotypes were called by the Birdseed calling algorithm from Broad and available in the Affymetrix Power Tools apt-1.8.6 (release March 4, 2008). Models files were based on na24 release.

C. Quality control

Quality control was performed separately on the two platforms.

Samples are identified for removal on the basis of:

  1. High rates of missingness (> 2%)
  2. Excessive heterozygosity
  3. Cryptic relatedness by excessive identify-by-states
  4. Admixture or discordant ethnic membership through the use of principal components analysis

The following criteria were used to keep SNPs in the QC+ data sets:

  1. SNP missingness ≤ 5% (per population)
  2. Hardy-Weinberg p-value < 0.001 (per population)
  3. Duplicate discordance ≤ 1 (out of a possible of 3 duplicate discordances)
  4. SNPs that are polymorphic in at least one population

In all genotype files, alleles are expressed on the forward strand of the NCBI build 36. QC+ datasets contains SNPs that passed the above quality criteria and are polymorphic in at least one ethnic group while QC+mono datasets include SNPs that are monomorphic across the three ethnic groups.

D. Merged data

Data from the two platforms were merged by rsID, and further checks were done with chromosomal positions. Only individuals with genotype data on both platforms are kept. For common SNPs, those with < 95% concordance between the two platforms are removed. The remaining SNPs with higher than 95% concordance, the genotype calls from the platform with higher call rates are kept. For SNPs with the same extent of missingness, the Illumina genotypes are retained. Genomic positions were further checked to confirm uniqueness of the SNPs in the dataset.

Population

Final Number of samples
QC+
QC+Mono

CHS

96
1,405,417
1,584,040
MAS
89
1,402,256
1,580,905
INS
83
1,404,699
1,583,454

 

 



Click here to download sample information

E: Data Release Policy

Please cite the following publication if you are using the data in any publication.
Teo YY, Sim X, Ong RTH, Tan AKS, Chen JM, Tantoso E, Small KS, Ku CS, Lee EJD, Seielstad M and Chia KS. Singapore Genome Variation Project: A Haplotype map of three South-East Asian populations. Genome Research (In press).



F. Funding agencies/Acknowledgements

Yong Loo Lin School of Medicine, National University of Singapore (NUS)

NUS Life Science Institute

Department of Community, Occupational and Family Health (COFM), NUS

Genome Institute of Singapore (GIS)

 




© Copyright 2001-08 National University of Singapore. All Rights Reserved.
Terms of Use | Privacy | Non-discrimination