如何计算目录中所有文件中单词的出现次数？

| 我正在尝试计算整个目录中某个单词的出现次数。这可能吗？例如，假设有一个包含100个文件的目录，所有文件中的文件中可能都带有单词“ aaa”。如何计算该目录下所有文件中的“ aaa”数？我尝试了类似的东西：

 zegrep \"xception\" `find . -name \'*auth*application*\' | wc -l

但这不起作用。

已邀请:

8 个回复

缔恃钨

grep -roh aaa . | wc -w Grep递归地搜索当前目录中的所有文件和目录，以搜索aaa，并且仅输出匹配项，而不输出整行。然后，只需使用wc即可计算出有多少个单词。

疏腔傻小雹

基于solution3ѭ和grep的另一种解决方案。

find . -type f -exec grep -o aaa {} \\; | wc -l

应正确处理文件名中带有空格的文件名。

抵浮细

让我们使用AWK！

$ function wordfrequency() { awk \'BEGIN { FS=\"[^a-zA-Z]+\" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf(\"%3d %s\\n\", words[w], w) } \' | sort -rn; }
$ cat your_file.txt | wordfrequency

这列出了提供的文件中每个单词出现的频率。如果您想查看单词的出现，可以执行以下操作：

$ cat your_file.txt | wordfrequency | grep yourword

要查找目录中所有文件中单词的出现（非递归），可以执行以下操作：

$ cat * | wordfrequency | grep yourword

要在目录（及其子目录）的所有文件中查找单词的出现，可以执行以下操作：

$ find . -type f | xargs cat | wordfrequency | grep yourword

资料来源：AWK-ward Ruby

哭木算

以最简单的方式使用grep。尝试grep --help了解更多信息。要获取特定文件中的单词数：

grep -c <word> <file_name>

例：

grep -c \'aaa\' abc_report.csv

输出：

要获得整个目录中的单词数：

grep -c -R <word>

例：

grep -c -R \'aaa\'

输出：

abc_report.csv:445
lmn_report.csv:129
pqr_report.csv:445
my_folder/xyz_report.csv:408

马口

find .|xargs perl -p -e \'s/ /\\n\'|xargs grep aaa|wc -l

翁茄口霉氖

将文件整理在一起并grep输出：cat $(find /usr/share/doc/ -name \'*.txt\') | zegrep -ic \'\\<exception\\>\' 如果您要匹配“例外”，请不要在单词周围使用\'\\ <\'和\'\\> \'。

驮帽俺篮号

如何开始：

cat * | sed \'s/ /\\n/g\' | grep \'^aaa$\' | wc -l

如以下成绩单所示：

pax$ cat file1
this is a file number 1

pax$ cat file2
And this file is file number 2,
a slightly larger file

pax$ cat file[12] | sed \'s/ /\\n/g\' | grep \'file$\' | wc -l
4

sed可以将空格转换为换行符（您可能还想在其他地方加上空格，例如tab23 with）。 grep只是获得那些具有所需单词的行，然后wc为您计算这些行。现在，在某些极端情况下，此脚本无法正常工作，但对于绝大多数情况来说应该可以。如果您想要一棵完整的树（而不仅仅是单个目录级别），则可以使用以下方法：

( find . -name \'*.txt\' -exec cat {} \';\' ) | sed \'s/ /\\n/g\' | grep \'^aaa$\' | wc -l

凸晴

还有grep regex语法，仅用于匹配单词：

# based on Carlos Campderrós solution posted in this thread
man grep | less -p \'\\<\'
grep -roh \'\\<aaa\\>\' . | wc -l

有关其他与单词匹配的正则表达式语法，请参见：

man re_format | less -p \'\\[\\[:<:\\]\\]\'

要回复问题请先登录或注册

如何计算目录中所有文件中单词的出现次数？

8 个回复

发起人

count

find

linux

unix

grep

问题状态

如何计算目录中所有文件中单词的出现次数？

与内容相关的链接

8 个回复

发起人

count

find

linux

unix

grep

问题状态