mplement a program that identifies a person based on their DNA. $ python3 dna.py databases/large.csv sequences/5.txt Lavender [https://drive.google.com/drive/folders/1uEjrE0VNWG5HZYcmsMD8ZJUWgeW2whK6?usp=sharing] Download the file that you're going to use for this problem. Extract this file and you will see a directory of sample databases and a directory of sample sequences. Specification In a file called dna.py, implement a program that identifies to whom a sequence of DNA belongs. The program should require as its command-line argument that name of a CSV file containing the STR counts for a list of individuals and should require as its second command-line argument the name of a text file containing the DNA sequence to identify. If your program is executed with the incorrect number of command-line arguments, your program should print an error message of your choice. If the correct number of arguments are provided, you may assume that the first argument is indeed a filename of a valid CSV file, and that the second argument is the filename of a valid text file. Your program should open the CSV file and read its contents into memory. You may assume that the first row of the CSV file will be the column names. The first column will be the word name and the remaining columns will be STR sequences themselves. Your program should open the DNA sequence and read its contents into memory. For each of the STRs (from the first line of the CSV file), your program should compute the longest run of consecutive repeats of the STR in the DNA sequence to identify. If the STR counts match exactly with any of the individuals in the CSV file, your program should print out the name of the matching individual. You may assume that the STR counts will not match more than one individual. If the STR counts do not match exactly with any of the individuals in the CSV file. your program should print "No match". Sample Runs $ python3 dna.py databases/large.csv sequence/5.txt Lavender$ python3 dna.py Usage: python3 dna.py data.csv sequence.txt$ python3 dna.py data.csv Usage: python3 dna.py data.csv sequence.txt Testing You can use the following test cases to test the correctness of your program. $ python3 dna.py databases/small.csv sequences/1.txt Bob $ python3 dna.py databases/small.csv sequences/2.txt No match $ python3 dna.py databases/small.csv sequences/3.txt No match $ python3 dna.py databases/small.csv sequences/4.txt Alice $ python3 dna.py databases/large.csv sequences/5.txt Lavender

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question

Implement a program that identifies a person based on their DNA.

$ python3 dna.py databases/large.csv sequences/5.txt Lavender


[https://drive.google.com/drive/folders/1uEjrE0VNWG5HZYcmsMD8ZJUWgeW2whK6?usp=sharing]
Download the file that you're going to use for this problem. Extract this file and you will see a directory of sample databases and a directory of sample sequences.

Specification

In a file called dna.py, implement a program that identifies to whom a sequence of DNA belongs.

  • The program should require as its command-line argument that name of a CSV file containing the STR counts for a list of individuals and should require as its second command-line argument the name of a text file containing the DNA sequence to identify.
    • If your program is executed with the incorrect number of command-line arguments, your program should print an error message of your choice.
    • If the correct number of arguments are provided, you may assume that the first argument is indeed a filename of a valid CSV file, and that the second argument is the filename of a valid text file.
  • Your program should open the CSV file and read its contents into memory.
    • You may assume that the first row of the CSV file will be the column names.
    • The first column will be the word name and the remaining columns will be STR sequences themselves.
  • Your program should open the DNA sequence and read its contents into memory.
  • For each of the STRs (from the first line of the CSV file), your program should compute the longest run of consecutive repeats of the STR in the DNA sequence to identify.
  • If the STR counts match exactly with any of the individuals in the CSV file, your program should print out the name of the matching individual.
    • You may assume that the STR counts will not match more than one individual.
    • If the STR counts do not match exactly with any of the individuals in the CSV file. your program should print "No match".

Sample Runs

$ python3 dna.py databases/large.csv sequence/5.txt Lavender$ python3 dna.py Usage: python3 dna.py data.csv sequence.txt$ python3 dna.py data.csv Usage: python3 dna.py data.csv sequence.txt

Testing

You can use the following test cases to test the correctness of your program.

$ python3 dna.py databases/small.csv sequences/1.txt Bob

$ python3 dna.py databases/small.csv sequences/2.txt No match

$ python3 dna.py databases/small.csv sequences/3.txt No match

$ python3 dna.py databases/small.csv sequences/4.txt Alice

$ python3 dna.py databases/large.csv sequences/5.txt Lavender 

In a file dna.py implement a program that identifies to whom a sequence of DNA belongs.
The program should require as its first command-line argument the name of a CSV file
containing the STR counts for a list of individuals and should require as its second
command-line argument the name of a text file containing the DNA sequence to identify.
If the program is executed with the incorrect number of command-line arguments, the
program should print an error message of your choice (with print). If the correct number
of arguments are provided, you may assume that the first argument is indeed the filename
of a valid CSV file, and that the second argument is the filename of a valid text file.the
program should open the CSV file and read its contents into memory.
You may assume that the first row of the CSV file wvill be the column names. The first
column will be the word name and the remaining columns will be the STR sequences
themselves.the program should open the DNA sequence and read its contents into memory.
For each of the STRS (from the first line of the CSV file), the program should compute the
longest run of consecutive repeats of the STR in the DNA sequence to identify.
If the STR counts match exactly with any of the individuals in the CSV file, the program
should print out the name of the matching individual.
You may assume that the STR counts will not match more than one individual.
If the STR counts do not match exactly with any of the individuals in the CSv file, the
program should print "No match".
Transcribed Image Text:In a file dna.py implement a program that identifies to whom a sequence of DNA belongs. The program should require as its first command-line argument the name of a CSV file containing the STR counts for a list of individuals and should require as its second command-line argument the name of a text file containing the DNA sequence to identify. If the program is executed with the incorrect number of command-line arguments, the program should print an error message of your choice (with print). If the correct number of arguments are provided, you may assume that the first argument is indeed the filename of a valid CSV file, and that the second argument is the filename of a valid text file.the program should open the CSV file and read its contents into memory. You may assume that the first row of the CSV file wvill be the column names. The first column will be the word name and the remaining columns will be the STR sequences themselves.the program should open the DNA sequence and read its contents into memory. For each of the STRS (from the first line of the CSV file), the program should compute the longest run of consecutive repeats of the STR in the DNA sequence to identify. If the STR counts match exactly with any of the individuals in the CSV file, the program should print out the name of the matching individual. You may assume that the STR counts will not match more than one individual. If the STR counts do not match exactly with any of the individuals in the CSv file, the program should print "No match".
name, AGAT, AATG, TATC
Alice, 28,42,14
Bob, 17, 22, 19
Charlie, 36,18,25
The data in the above file would suggest that Alice has the sequence AGAT repeated 28
times consecutively somewhere in her DNA, the sequence AATG repeated 42 times, and
TATC repeated 14 times. Bob, meanwhile, has those same three STRS repeated 17 times, 22
times, and 19 times, respectively. And Charlie has those same three STRS repeated 36, 18,
and 25 times, respectively.
So given a sequence of DNA, how might you identify to whom it belongs? Well, imagine
that you looked through the DNA sequence for the longest consecutive sequence of
repeated AGATS and found that the longest sequence was 17 repeats long. If you then
found that the longest sequence of AATG is 22 repeats long, and the longest sequence of
TATC is 19 repeats long, that would provide pretty good evidence that the DNA was Bob's.
Of course, it's also possible that once you take the counts for each of the STRS, it doesn't
match anyone in your DNA database, in which case you have no match.
In practice, since analysts know on which chromosome and at which location in the DNA
an STR will be found, they can localize their search to just a narrow section of DNA. But
we'll ignore that detail
the task is to write a program that will take a sequence of DNA and a CSV file containing
STR counts for a list of individuals and then output to whom the DNA (most likely)
belongs.
Transcribed Image Text:name, AGAT, AATG, TATC Alice, 28,42,14 Bob, 17, 22, 19 Charlie, 36,18,25 The data in the above file would suggest that Alice has the sequence AGAT repeated 28 times consecutively somewhere in her DNA, the sequence AATG repeated 42 times, and TATC repeated 14 times. Bob, meanwhile, has those same three STRS repeated 17 times, 22 times, and 19 times, respectively. And Charlie has those same three STRS repeated 36, 18, and 25 times, respectively. So given a sequence of DNA, how might you identify to whom it belongs? Well, imagine that you looked through the DNA sequence for the longest consecutive sequence of repeated AGATS and found that the longest sequence was 17 repeats long. If you then found that the longest sequence of AATG is 22 repeats long, and the longest sequence of TATC is 19 repeats long, that would provide pretty good evidence that the DNA was Bob's. Of course, it's also possible that once you take the counts for each of the STRS, it doesn't match anyone in your DNA database, in which case you have no match. In practice, since analysts know on which chromosome and at which location in the DNA an STR will be found, they can localize their search to just a narrow section of DNA. But we'll ignore that detail the task is to write a program that will take a sequence of DNA and a CSV file containing STR counts for a list of individuals and then output to whom the DNA (most likely) belongs.
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY