Rosalind编程问题之计算蛋白质分子量。
Calculating Protein Mass
Problem
In a weighted alphabet, every symbol is assigned a positive real number called a weight. A string formed from a weighted alphabet is called a weighted string, and its weight is equal to the sum of the weights of its symbols.
The standard weight assigned to each member of the 20-symbol amino acid alphabet is the monoisotopic mass of the corresponding amino acid.
Given: A protein string P of length at most 1000 aa.
Sample input:
Return: The total weight of P. Consult the monoisotopic mass table
题目需要我们根据蛋白质氨基酸序列输出蛋白质的质量。结题思路比较简单:
1.读取氨基酸序列
2.输出对应氨基酸的质量并求和
下面是实现代码:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class Calculating_Protein_Mass {
public static void main(String[] args) {
//1.读取氨基酸序列
String pro = readFileContent("C:/Users/Administrator/Desktop/rosalind_prtm.txt");
//2.输出对应氨基酸的质量并求和
System.out.println(SumMass(pro));
}
首先,我们需要读取氨基酸序列
//1.读取氨基酸序列为文本
public static String readFileContent(String fileName) {
File file = new File(fileName);
BufferedReader reader = null;
StringBuffer sbf = new StringBuffer();
try {
reader = new BufferedReader(new FileReader(file));
String tempStr;
while ((tempStr = reader.readLine()) != null) {
sbf.append(tempStr);
}
reader.close();
return sbf.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
return sbf.toString();
}
定义第二个方法以读取氨基酸并且累加分子量。
分子量的参考数值请见Rosalind官方链接。
//2.输出对应氨基酸的质量并求和,因为要输出小数所以得定义double类型变量
public static float SumMass(String pro) {
float sum = 0;
for (int i = 0; i < pro.length(); i++) {
String AA = pro.substring(i, i + 1);//获取单个氨基酸
switch (AA) {
case "A":
sum += 71.03711;
break;
case "C":
sum += 103.00919;
break;
case "D":
sum += 115.02694;
break;
case "E":
sum += 129.04259;
break;
case "F":
sum += 147.06841;
break;
case "G":
sum += 57.02146;
break;
case "H":
sum += 137.05891;
break;
case "I":
sum += 113.08406;
break;
case "K":
sum += 128.09496;
break;
case "L":
sum += 113.08406;
break;
case "M":
sum += 131.04049;
break;
case "N":
sum += 114.04293;
break;
case "P":
sum += 97.05276;
break;
case "Q":
sum += 128.05858;
break;
case "R":
sum += 156.10111;
break;
case "S":
sum += 87.03203;
break;
case "T":
sum += 101.04768;
break;
case "V":
sum += 99.06841;
break;
case "W":
sum += 186.07931;
break;
case "Y":
sum += 163.06333;
break;
default:
break;
}
}
return sum;
}
将上面三个代码连接在一起即可实现计算蛋白质分子量。结果如下:
传回网站的时候再加一步四舍五入到小数点后三位即可得到正确答案。
Float类型和Double类型
在java中变量的类型在建立之初就是被严格定义好的,这也是Java被称为强类型语言的原因。不同的数据类型分配了不同的内存空间,因此其表示的数据大小也不同。Java中的具体基本存储类型包括:整数(byte,short,int,long),浮点数(float,double),字符(char),布尔(boolean)。定义存储类型时,给出的整数默认为整数型int,给出小数默认为浮点数double。 各种储存类型的储存范围不同,其中两个浮点数关键字取值范围float要小于double。因此如果求和氨基酸分子量的方法输出改为float格式会出现什么不同呢?
所以咱又试了一下float类型定义输出方法2:(下面代码整合了上述三段代码,并修改了SumMass方法)
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class Calculating_Protein_Mass {
public static void main(String[] args) {
//1.读取氨基酸序列
String pro = readFileContent("C:/Users/Administrator/Desktop/rosalind_prtm.txt");
//2.输出对应氨基酸的质量并求和
System.out.println(SumMass(pro));
}
//1.读取氨基酸序列为文本
public static String readFileContent(String fileName) {
File file = new File(fileName);
BufferedReader reader = null;
StringBuffer sbf = new StringBuffer();
try {
reader = new BufferedReader(new FileReader(file));
String tempStr;
while ((tempStr = reader.readLine()) != null) {
sbf.append(tempStr);
}
reader.close();
return sbf.toString();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (reader != null) {
try {
reader.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
return sbf.toString();
}
//2.输出对应氨基酸的质量并求和,因为要输出小数所以得定义double类型变量
public static float SumMass(String pro) {
float sum = 0;
for (int i = 0; i < pro.length(); i++) {
String AA = pro.substring(i, i + 1);//获取单个氨基酸
switch (AA) {
case "A":
sum += 71.03711;
break;
case "C":
sum += 103.00919;
break;
case "D":
sum += 115.02694;
break;
case "E":
sum += 129.04259;
break;
case "F":
sum += 147.06841;
break;
case "G":
sum += 57.02146;
break;
case "H":
sum += 137.05891;
break;
case "I":
sum += 113.08406;
break;
case "K":
sum += 128.09496;
break;
case "L":
sum += 113.08406;
break;
case "M":
sum += 131.04049;
break;
case "N":
sum += 114.04293;
break;
case "P":
sum += 97.05276;
break;
case "Q":
sum += 128.05858;
break;
case "R":
sum += 156.10111;
break;
case "S":
sum += 87.03203;
break;
case "T":
sum += 101.04768;
break;
case "V":
sum += 99.06841;
break;
case "W":
sum += 186.07931;
break;
case "Y":
sum += 163.06333;
break;
default:
break;
}
}
return sum;
}
}
输出结果变成了:
float类型变量保留的小数精度相较于double变小了。这与float类型和double类型的取值范围和内存占用有关,后者均大于前者。因此,double能保留更多的精度。指定小数点位数输出的方法也有CSDN大佬提及,详情可以查看这篇。