Concept explainers
Eukaryotic genomes are replete with repetitive sequences that make genome assembly from sequence reads difficult. For example, sequences such as CTCTCTCTCT .(tandem repeats of the dinucleotide sequence CT) are found at many chromosomal locations, with variable numbers (n) of the CT repeating unit at each location. Scientists can assemble genomes despite these difficulties by using the paired-end sequencing strategy diagrammed in Fig. 9.9. In other words, they can make libraries with genomic inserts of defined size, and then sequence both ends of individual clones.
Following are 12 DNA sequence reads from six cloned fragments analyzed in a genome project. 1A and 1B represent the two end reads from clone 1, 2A and 2B the two end reads from clone 2, etc. Clones 1–4 were obtained from a library in which the genomic inserts are about 2 kb long, while the inserts in clones 5 and 6 are about 4 kb long. All of these sequences have their 5′ ends at the left and their 3′ ends at the right. To simplify your analysis, assume that these sequences together represent two genomic locations (loci; singular locus), each of which contains a (CT)n repeat, and that each of the 12 sequences overlaps with one and only one other sequence.
1A: CCGGGAACTCCTAGTGCCTGTGGCACGATCCTATCAAC
1B: AGGACTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCT
2A: GTTTTTGAGAGAGAGAGAGAGAGAGAGAGACCTGGGGG
2B: ACGTAGCTAGCTAACCGGTTAAGCGCGCATTACTTCAA
3A: CTCTCTCTCTCTCTCTCTCTCAAAAACTATGGAAATTT
3B: TAGTGATAGGTAACCCAGGTACTGCACCACCAGAAGTC
4A: GGCCGGCCGTTGTTGACGCAATCATGAATTTAATGCCG
4B: TCATGGGAGAGAGAGAGAGAGAGAGAGAGAGAGAGAGA
5A: TAGTGCCTGTGGCACGATCCTATCAACTAACGACTGCT
5B: AAGGAAAGGCCGGCCGTTGTTGACGCAATCATGAATTT
6A: CAGCAGCTAGTGATAGGTAACCCAGGTACTGCACCACC
6B: GGACTATACGTAGCTAGCTAACCGGTTAAGCGCGCATT
a. | Diagram the two loci, showing the locations of the repetitive DNA and the relative positions and orientations of the 12 DNA sequence reads. |
b. | If possible, indicate how many copies of the CT repeating unit reside at either locus. |
c. | Are the data compatible with the alternative hypothesis that these clones actually represent two alleles of a single locus that differ in the number of CT repeating units? |
Trending nowThis is a popular solution!
Chapter 9 Solutions
Genetics: From Genes to Genomes
- The human RefSeq of the entire first exon of a geneinvolved in Brugada syndrome (a cardiac disordercharacterized by an abnormal electrocardiogram andan increased risk of sudden heart failure) is:5′ CAACGCTTAGGATGTGCGGAGCCT 3′The genomic DNA of four people (1–4), three ofwhom have the disorder, was subjected to singlemolecule sequencing. The following sequences represent all those obtained from each person. Nucleotidesdifferent from the RefSeq are underlined. Individual 1:5′ CAACGCTTAGGATGTGCGGAGCCT 3′and5′ CAACGCTTAGGATGTGCGGAGACT 3′Individual 2:5′ CAACGCTTAGGATGTGAGGAGCCT 3′Individual 3:5′ CAACGCTTAGGATGTGCGGAGCCT 3′and5′ CAACGCTTAGGATGGCGGAGCCT 3′Individual 4:5′ CAACGCTTAGGATGTGCGGAGCCT 3′and5′ CAACGCTTAGGATGTGTGGAGCCT 3′a. The first exon of the RefSeq copy of this gene includes the start codon. Write as much of the aminoacid sequence of the encoded protein as possible,indicating the N-to-C polarity.b. Are any of these individuals homozygotes? If so,which person and what allele?c. Is…arrow_forwardYou were going to sequence a rice DNA fragment whose sequence was only know at one end, as shown below. 5’ AAACGATCGAGTCGCATCCAAAATCGATACCC—unknown region 3’ TTTGCTAGCTCTGCGTAGGTTTTAGCTATGGG—unknown region After several tries, you obtained a beautiful sequencing image as shown here: The worked out well partially because you had designed a primer for sequencing the unknown region according to the following guideline: Tm is 55 – 60°C. Ensures primer had a appropriate melting temperature for PCR ans sequencing. The GC content of the primer is the same as the genome/template (rice = 60%, human/Drosophila = 45-50%). A same nucleotide cannot be more than 2 in a row, e.g. CCC, GGGGG, AAA. The secondary structure of the primer must be none or weak. No primer dimers (The primer anneals to itself). 3’ end is the most important: it should not end in A, preferably ends in GG, GC, CG or CC This website can help you design the primer: http://www.oligoevaluator.com/OligoCalcServlet…arrow_forwardAssume 2x108 reads of 75 bps long are obtained from a next-generation sequencing experiment to sequence a human genome. Suppose the length of the human genome is 3x109 bps. What is the depth (i.e., coverage) of the sequencing?arrow_forward
- The human genome contains approximately 106 copies of an Alusequence, one of the best-studied classes of short interspersed elements(SINEs), per haploid genome. Individual Alu units share a282-nucleotide consensus sequence followed by a 3@adenine@richtail region [Schmid (1998)]. Given that there are approximately3 * 10^9 base pairs per human haploid genome, about how manybase pairs are spaced between each Alu sequence?arrow_forwardArabidopsis thaliana has among the smallest genomes in higher plants, with a haploid genome size of about 100 Mb. If this genome is digested with BbvCl, a restriction enzyme which cuts at the sequence CCTCAGC GGAGTCG 1. approximately how many DNA fragments would be produced? Assume the DNA has a random sequence with equal amounts of each base.arrow_forwardConsider a genome whose length is 1000 bp. "Shotgun" sequencing techniques are applied to the genome, resulting in 20 reads, with an average length of 50 bp. A very important point is that, even though 20×50 = 1000, there is no guarantee that ALL 1000 bp of the genome are represented in the fragments. Calculate the coverage. What does this value mean? Why would it be a good idea to have a coverage greater than 1?arrow_forward
- In order to target a specific region of genomic DNA with CRISPR, researchers must include a guide RNA containing a 20-basepair long spacer sequence that matches the DNA sequence at the target site. (i) How many possible guide RNA spacer sequences are there? (ii) One of the possible risks of genetic engineering methods is “off-target” editing, where a modification of the genome occurs in a part of the genome other than the target site. Imagine you design a 20-basepair guide RNA spacer sequence to target a specific portion of the Zebrafish genome, which is 1.7 billion nucleotides long. Assuming all nucleotides are equally common, estimate the probability that your spacer sequence occurs in at least one other position in the Zebrafish genome.arrow_forwardThe genome of Drosophila melanogaster, a fruit fly, was sequenced in 2000. However, this “completed” sequence did not include most heterochromatin regions. The heterochromatin was not sequenced until 2007 . Most completed genome sequences do not include heterochromatin. Why is heterochromatin usually not sequenced in genome-sequencing projects?arrow_forwardThe restriction endonuclease NciI recognizes and cuts the five-base-pair sequence 5’- CC(G/C)GG-3’ [where (G/C) means either G or C will work at that position]. (1) How often, on average, would this sequence occur in random DNA? Assume the DNA contains 25% each of A, G, T & C. (2) After digestion, Nci1 leaves a one-base 5’ overhang. Write/draw the cut site/digested products.arrow_forward
- When the cDNA was sequenced by the Sanger method utilizing ddCTP, the following products were obtained: Tetranucleotide Hexanucleotide Nonanucleotide Decanucleotide Dodenucleotide Octadecanucleotide Nonadecanucleotide 21-nucleotide 6c. What is the sequence of the bases in the mRNA coding for the peptide above? Thearrow_forwardRecombination signal sequences are conserved heptamer and nonamer sequences that flank the V, J, and D gene segments which undergo recombination to generate the final V region coding exon. Some of these have 12-nucleotide spacers between the heptamer and nonamer, and others have 23-nucleotide spacers. The reason recombination signal sequences come in these two forms is: To ensure the correct assembly of gene segments so that a VH recombines to a DH and not to another VH, for instance To ensure that the heptamer and nonamer are found on the same face of the DNA double helix To ensure that alpha, lambda, and heavy chains recombine within a locus and not between loci To ensure that alpha, lambda, and heavy chain gene segments do not undergo recombination with non-immunoglobulin genes To ensure that the RAG recombinase cuts the DNA between the last nucleotide of the heptamer and the coding sequencearrow_forwardAbout 60% of the base pairs in the human genome are AT. If the human genome has 3.2 billion base pairs of DNA, about how many times will the following restriction sites be present? a. BamHI (recognition sequence is 5′–GGATCC–3′) b. EcoRI (recognition sequence is 5′–GAATTC–3′) c. HaeIII (recognition sequence is 5′–GGCC–3′)arrow_forward
- Biology: The Dynamic Science (MindTap Course List)BiologyISBN:9781305389892Author:Peter J. Russell, Paul E. Hertz, Beverly McMillanPublisher:Cengage Learning