![]() |
#2
pangding2011-07-05 18:48
|

@HWI-BRUNOP16X_0001:4:1:1386:1030#0/1
GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC
+HWI-BRUNOP16X_0001:4:1:1386:1030#0/1
TNTTJTTTETceJSP__VRJea`_NfcefbWe[eagggggfgdggBBBBB
@HWI-BRUNOP16X_0001:4:1:1836:1032#0/1
CAGCAGACGCTTTGATTGCTCGATCTCTTGGTAAATACGGCATCATCTGC
+HWI-BRUNOP16X_0001:4:1:1836:1032#0/1
TJTTTJFFFFa`TWNMPJGTbZSTPZJHHGT^I^H^SKZeeeeeeb``RT
我利用一下的code,来读取及处理14Gb的文档:

[home@cpp]time grep -A1 '@HWI' input_file.txt | sed -e 's/--//g' -e 's/@HWI/>HWI/g' | sed '/^$/d' > output_file.txt
real 7m13.413s
user 5m25.382s
sys 1m37.668s
[1]+ Done time grep -A1 '@HWI' input_file.txt | sed -e 's/--//g' -e 's/@HWI/>HWI/g' | sed '/^$/d' > output_file.txt
[home@cpp]cat output_file.txt
>HWI-BRUNOP16X_0001:4:1:1386:1030#0/1
GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC
>HWI-BRUNOP16X_0001:4:1:1836:1032#0/1
CAGCAGACGCTTTGATTGCTCGATCTCTTGGTAAATACGGCATCATCTGC
在文体框,理想的格式输入法:

c++程序名称 要被读取的文档名称 结果输出新的文档名称
例如:
cplusplus_program input_file.txt output_file.txt
c++分析要被读取的文档逻辑:
1.当c++程序,发现要被读取文档的开头是"@HWI"时,取出该行及其下一行的内容,并将结果储存于新的文档内;
2.将所有开头是"@HWI",改成">HWI";
3.结果文档的理想内容是:
>HWI-BRUNOP16X_0001:4:1:1386:1030#0/1
GACCAATAAGTGATGATTGAATCGCGAGTGCTCGGCAGATTGCGATAAAC
谢谢各位c++高手的意见...