Java：通过HashMap获得文本中的500个最常用的词

|| 我将单词计数存储到HashMap的value字段中，然后如何获取文本中的500个热门单词？

 public ArrayList<String> topWords (int numberOfWordsToFind, ArrayList<String> theText) {

        //ArrayList<String> frequentWords = new ArrayList<String>();

        ArrayList<String> topWordsArray= new ArrayList<String>();

        HashMap<String,Integer> frequentWords = new HashMap<String,Integer>();

        int wordCounter=0;

        for (int i=0; i<theText.size();i++){



                  if(frequentWords.containsKey(theText.get(i))){

                       //find value and increment
                      wordCounter=frequentWords.get(theText.get(i));
                      wordCounter++;
                      frequentWords.put(theText.get(i),wordCounter);

                  }

                else {
                  //new word
                  frequentWords.put(theText.get(i),1);

                }
        }


        for (int i=0; i<theText.size();i++){

            if (frequentWords.containsKey(theText.get(i))){
                 // what to write here?
                frequentWords.get(theText.get(i));

            }
        }
        return topWordsArray;
    }

已邀请:

3 个回复

际恃啸称桅

您可能希望查看的另一种方法是换种方式思考：这里的Map真的是正确的概念对象吗？最好将它视为Java中被忽略的数据结构，即bag的良好用法。袋子就像一套，但允许物品多次放入集合中。这大大简化了“添加找到的单词”。 Google的番石榴库提供了Bag结构，尽管这里称为called1。使用多重集，即使每个单词已经在其中，您也可以仅对每个单词调用.add()。但是，更简单的是，您可以放弃循环：

Multiset<String> words = HashMultiset.create(theText);

现在您有了一个Multiset，该怎么办？好吧，您可以调用entrySet()，它为您提供了Multimap.Entry对象的集合。然后，您可以将它们放在List中（它们放在Set中），然后使用Comparator对其进行分类。完整的代码可能看起来像（使用其他一些流行的Guava功能进行展示）：

Multiset<String> words = HashMultiset.create(theWords);

List<Multiset.Entry<String>> wordCounts = Lists.newArrayList(words.entrySet());
Collections.sort(wordCounts, new Comparator<Multiset.Entry<String>>() {
    public int compare(Multiset.Entry<String> left, Multiset.Entry<String> right) {
        // Note reversal of \'right\' and \'left\' to get descending order
        return right.getCount().compareTo(left.getCount());
    }
});
// wordCounts now contains all the words, sorted by count descending

// Take the first 50 entries (alternative: use a loop; this is simple because
// it copes easily with < 50 elements)
Iterable<Multiset.Entry<String>> first50 = Iterables.limit(wordCounts, 50);

// Guava-ey alternative: use a Function and Iterables.transform, but in this case
// the \'manual\' way is probably simpler:
for (Multiset.Entry<String> entry : first50) {
    wordArray.add(entry.getElement());
}

完成了！

荆怖赡

在这里，您可以找到有关如何按值对HashMap进行排序的指南。排序之后，您可以迭代前500个条目。

咳累录酬

看一下Apache Commons Collections包提供的TreeBidiMap。 http://commons.apache.org/collections/api-release/org/apache/commons/collections/bidimap/TreeBidiMap.html 它使您可以根据键或值集对地图进行排序。希望能帮助到你。忠县

要回复问题请先登录或注册

Java：通过HashMap获得文本中的500个最常用的词

3 个回复

发起人

java

hashmap

问题状态

Java：通过HashMap获得文本中的500个最常用的词

与内容相关的链接

3 个回复

发起人

java

hashmap

问题状态