停止在Javascript中删除单词

| 嗨,我正在寻找一个库,该库将从
Javascript
中的文本中删除停用词,我的最终目标是计算tf-idf,然后将给定的文档转换为向量空间,所有这些都是
Javascript
。 任何人都可以将我指向一个可以帮助我做到这一点的库。只需一个库来删除停用词也很好。     
已邀请:
我认为没有这样的库,您需要下载这些单词http://www.ranks.nl/resources/stopwords.html。然后按照注释中的文字内容进行替换text = text.replace(停用词,\“ \”)     
使用NLTK库提供的停用词:
stopwords = [\'i\',\'me\',\'my\',\'myself\',\'we\',\'our\',\'ours\',\'ourselves\',\'you\',\'your\',\'yours\',\'yourself\',\'yourselves\',\'he\',\'him\',\'his\',\'himself\',\'she\',\'her\',\'hers\',\'herself\',\'it\',\'its\',\'itself\',\'they\',\'them\',\'their\',\'theirs\',\'themselves\',\'what\',\'which\',\'who\',\'whom\',\'this\',\'that\',\'these\',\'those\',\'am\',\'is\',\'are\',\'was\',\'were\',\'be\',\'been\',\'being\',\'have\',\'has\',\'had\',\'having\',\'do\',\'does\',\'did\',\'doing\',\'a\',\'an\',\'the\',\'and\',\'but\',\'if\',\'or\',\'because\',\'as\',\'until\',\'while\',\'of\',\'at\',\'by\',\'for\',\'with\',\'about\',\'against\',\'between\',\'into\',\'through\',\'during\',\'before\',\'after\',\'above\',\'below\',\'to\',\'from\',\'up\',\'down\',\'in\',\'out\',\'on\',\'off\',\'over\',\'under\',\'again\',\'further\',\'then\',\'once\',\'here\',\'there\',\'when\',\'where\',\'why\',\'how\',\'all\',\'any\',\'both\',\'each\',\'few\',\'more\',\'most\',\'other\',\'some\',\'such\',\'no\',\'nor\',\'not\',\'only\',\'own\',\'same\',\'so\',\'than\',\'too\',\'very\',\'s\',\'t\',\'can\',\'will\',\'just\',\'don\',\'should\',\'now\']
然后只需将您的字符串传递给以下函数:
function remove_stopwords(str) {
    res = []
    words = str.split(\' \')
    for(i=0;i<words.length;i++) {
        if(!stopwords.includes(words[i])) {
            res.push(words[i])
        }
    }
    return(res.join(\' \'))
  }
例:
remove_stopwords(\"I will go to the place where there are things for me.\")
结果:
I go place things me.
只需将尚未覆盖的任何单词添加到您的NLTK数组中即可。     
这里有一个用于删除停用词的Javascript库:http://geeklad.com/remove-stop-words-in-javascript     
这是一个带有英语停用词的数组。希望能帮助到你。来自http://www.ranks.nl/stopwords(在先前的答案中提到)。 另外,这可能是对您有用的资源。 https://github.com/shiffman/A2Z-F16/tree/gh-pages/week5-analysis http://shiffman.net/a2z/text-analysis/ var stopwords = [\“ a \”,\“ about \”,\“ above \”,\“ after \”,\“ again \”,\“ against \”,\“ all \”,\“ am \ “,\” an \“,\”和\“,\”任何\“,\”是\“,\” aren \'t \“,\” as \“,\” at \“,\”成为\“,\”因为\“,\” been \“,\”之前\“,\”在\“,\”在下面“,\”在\“之间,\”两者\“,\”但是\“ ,\“ by \”,\“不能\”,\“不能\”,\“可以\”,\“不能\”,\“ did \”,\“ dndn \'t \ “,\” do \“,\” does \“,\” doesn \'t \“,\”正在做“,\” don \'t \“,\”下\“,\”期间\“, \“每个\”,\“很少\”,\“用于\”,\“来自\”,\“进一步\”,\“已经\”,\“没有\”,\“已经\” ,\“没有\”,\“拥有\”,\“天堂”,\“拥有\”,\“他\”,\“他\\”,\“他\” 'll \“,\”他\'s \“,\”她\“,\”这里\“,\”这里\“ \”,\“她的\”,\“她自己\”,\“他\“,\”他自己\“,\”他的\“,\”如何\“,\”如何\“,\” i \“,\” i \'d \“,\” i \' ll \“,\” i \'m \“,\” i \'ve \“,\” if \“,\” in \“,\” into \“,\” is \“,\” isn \ 't \“,\” it \“,\” it \'s \“,\” its \“,\” itself \“,\” let \'s \“,\” me \“,\”更多\“,\”最\“,\”必须\“不\”,\“我\”,\“我自己\”,\“不\”,\“不\”,\“不\”,\“ of \“,\” off \“,\” on \“,\” once \“,\” only \“,\”或\“,\” other \“,\”应该\“,\”我们的\ “,\”我们的\“,\”我们自己的“,\” out \“,\”超过\“,\”自己的“,\”相同\“,\”山\'t \“,\” s他\“,\”她\'d \“,\”她\'ll \“,\”她\“,\”应该\“,\”不应该“,\”所以\“ “,\” some \“,\” such \“,\” than \“,\” that \“,\” that \'s \“,\” the \“,\” their \“,\” theirs \“,\”他们\“,\”他们自己“,\”然后\“,\”那里\“,\”那里\“,\”这些\“,\”他们\“,\”他们\'d \“,\”他们\'ll \“,\”他们\'re \“,\”他们\'ve \“,\”此\“,\”那些\“,\”通过\ “,\”到\“,\”太\“,\”下\“,\”直到\“,\”上\“,\”非常\“,\”是\“,\”没“ \“,\”我们\“,\”我们\'d \“,\”我们\'ll \“,\”我们\“ \”,\“我们\” \“,\”我们\“ ,\“ weren \'t \”,\“ what \”,\“ what \'s \”,\“ when \”,\“ when \'s \”,\“ where \”,\“ where \” 's \“,\” which \“,\” while \“,\” who \“,\” who \'s \“,\” whom \“,\” why \“,\” why \'s \“,\”有\“,\”不会\“,\”会\“,\”不会\“,\”您\“,\”您\“ \”,\“您\“将\”,\“您\”,\“您已\”,\“您的”,\“您的\”,\“您自己”,\“您自己”];     

要回复问题请先登录注册