可能会增加python处理正在使用的RAM数量

|| 我在具有64GB RAM的Windows服务器上运行分类/功能提取任务，并且以某种方式，python认为我内存不足：

misiti@fff /cygdrive/c/NaiveBayes
$ python run_classify_comments.py > tenfoldcrossvalidation.txt
Traceback (most recent call last):
  File \"run_classify_comments.py\", line 70, in <module>
    run_classify_comments()
  File \"run_classify_comments.py\", line 51, in run_classify_comments
    NWORDS = get_all_words(\"./data/HUGETEXTFILE.txt\")
  File \"run_classify_comments.py\", line 16, in get_all_words
    def get_all_words(path): return words(file(path).read())
  File \"run_classify_comments.py\", line 15, in words
    def words(text): return re.findall(\'[a-z]+\', text.lower())
  File \"C:\\Program Files (x86)\\Python26\\lib\\re.py\", line 175, in findall
    return _compile(pattern, flags).findall(string)
MemoryError

所以re模块的64 GB RAM崩溃了...我不这么认为... 为什么会发生这种情况，如何配置python以使用计算机上的所有可用RAM？

已邀请:

2 个回复

闲窍

只需重写程序即可一次读取一行巨大的文本文件。只需将ѭ1更改为：

def get_all_words(path):
    return sum((words(line) for line in open(path))

请注意，括号中使用了生成器，该生成器是惰性的，将通过求和函数按需求值。

诉嘎归亮

在我看来，问题似乎在于使用re.findall（）将整个文本作为单词列表读入内存。您是否以这种方式阅读了超过64GB的文本？根据NaiveBayes算法的实现方式，您可能会更好地以增量方式构建频率词典，以便仅将词典保留在内存中（而不保留整个文本）。有关实现的更多信息可能有助于更直接地回答您的问题。

要回复问题请先登录或注册

可能会增加python处理正在使用的RAM数量

2 个回复

发起人

python

regex

nltk

问题状态

可能会增加python处理正在使用的RAM数量

与内容相关的链接

2 个回复

发起人

python

regex

nltk

问题状态