Python：将文件的特定行放入列表中

| 问候，我遇到了以下问题：给定一个具有以下结构的文件：

\'>some cookies  
chocolatejelly  
peanutbuttermacadamia  
doublecoconutapple  
\'>some icecream  
cherryvanillaamaretto  
peanuthaselnuttiramisu  
bananacoffee  
\'>some other stuff  
letsseewhatfancythings  
wegotinhere

目的：将所有包含\'> \'的行之后的所有条目作为单个字符串放入列表中码：

def parseSequenceIntoDictionary(filename):
    lis=[]
    seq=\'\'
    with open(filename, \'r\') as fp:
        for line in fp:
            if(\'>\' not in line):
                seq+=line.rstrip()
            elif(\'>\' in line):
                lis.append(seq)
                seq=\'\'
        lis.remove(\'\')
        return lis

因此，此功能遍历文件的每一行如果没有出现\'> \'，则将所有以下行连接起来并删除然后\'，如果出现\'> \'，它会自动将连接的字符串附加到列表中，并将\'sclear \'字符串\'seq \'附加到列表中以连接下一个序列问题：以输入文件为例，它只将“一些cookie”和“一些冰淇淋”中的内容放入列表中，而不是“其他”中的内容。因此，我们得到：

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee] but not  

[chocolatejelly 
peanutbuttermacadamia 
doublecoconutapple, cherryvanillaamaretto 
peanuthaselnuttiramisu 
bananacoffee, letsseewhatfancythings 
wegotinhere]

这里有什么错误的想法？我可能没有注意过迭代中的一些逻辑错误，但是我不知道在哪里。预先感谢您的任何提示！

已邀请:

5 个回复

献导外拘

问题是，当您击中带有\'>\'的行时，您仅存储当前节seq。文件结束后，您仍然可以打开该部分，但是不保存它。修复程序的最简单方法是：

def parseSequenceIntoDictionary(filename):
    lis=[]
    seq=\'\'
    with open(filename, \'r\') as fp:
        for line in fp:
            if(\'>\' not in line):
                seq+=line.rstrip()
            elif(\'>\' in line):
                lis.append(seq)
                seq=\'\'
        # the file ended
        lis.append(seq) # store the last section
        lis.remove(\'\')
        return lis

顺便说一句，您应该使用ѭ6来防止可能的错误。

室邢

如果找到带有>的新行，则仅将seq附加到结果列表中。因此，最后您有一个已填充的序列（包含丢失的数据），但没有将其添加到结果列表中。因此，在循环之后，如果其中包含一些数据，则只需添加seq，就可以了。

扫窟

my_list = []
with open(\'file_in.txt\') as f:
    for line in f:
        if line.startswith(\"\'>\"):
            my_list.append(line.strip().split(\"\'>\")[1])

print my_list  #[\'some cookies\', \'some icecream\', \'some other stuff\']

埃庐

好吧，你可以简单地以\'>分裂（如果我正确的话）

>>> s=\"\"\"
... \'>some cookies
... chocolatejelly
... peanutbuttermacadamia
... doublecoconutapple
... \'>some icecream
... cherryvanillaamaretto
... peanuthaselnuttiramisu
... bananacoffee
... \'>some other stuff
... letsseewhatfancythings
... wegotinhere  \"\"\"
>>> s.split(\"\'>\")
[\'\\n\', \'some cookies  \\nchocolatejelly  \\npeanutbuttermacadamia  \\ndoublecoconutapple  \\n\', \'some icecream  \\ncherryvanillaamaretto  \\npeanuthaselnuttiramisu  \\nbananacoffee  \\n\', \'some other stuff  \\nletsseewhatfancythings  \\nwegotinhere  \']
>>>

久坡

import re

def parseSequenceIntoDictionary(filename,regx = re.compile(\'^.*>.*$\',re.M)):
    with open(filename) as f:
        for el in regx.split(f.read()):
            if el:
                yield el.replace(\'\\n\',\'\')

print list(parseSequenceIntoDictionary(\'aav.txt\'))

要回复问题请先登录或注册

Python：将文件的特定行放入列表中

5 个回复

发起人

list

file

iteration

python

问题状态

Python：将文件的特定行放入列表中

与内容相关的链接

5 个回复

发起人

list

file

iteration

python

问题状态