通过列的值将多行合并为单行

| 我有一个制表符分隔的文本文件,该文件很大。对于文件中的列之一,文件中的许多行具有相同的值。我想把它们放在同一行。例如:
a foo
a bar
a foo2
b bar
c bar2
运行脚本后,它应变为:
a foo;bar;foo2
b bar
c bar2
如何在Shell脚本或Python中执行此操作? 谢谢。     
已邀请:
        使用awk,您可以尝试
{   a[$1] = a[$1] \";\" $2 }
END { for (item in a ) print item, a[item] }
因此,如果将此awk脚本保存在名为awkf.awk的文件中,并且如果您的输入文件是ifile.txt,请运行该脚本
awk -f awkf.awk ifile.txt | sed \'s/ ;/ /\'
sed脚本是删除开头; 希望这可以帮助     
        
from collections import defaultdict

items = defaultdict(list)
for line in open(\'sourcefile\'):
    key, val = line.split(\'\\t\')
    items[key].append(val)

result = open(\'result\', \'w\')
for k in sorted(items):
    result.write(\'%s\\t%s\\n\' % (k, \';\'.join(items[k])))
result.close()  
未经测试     
        经过Python 2.7测试:
import csv

data = {}

reader = csv.DictReader(open(\'infile\',\'r\'),fieldnames=[\'key\',\'value\'],delimiter=\'\\t\')
for row in reader:
    if row[\'key\'] in data:
        data[row[\'key\']].append(row[\'value\'])
    else:
        data[row[\'key\']] = [row[\'value\']]

writer = open(\'outfile\',\'w\')
for key in data:
    writer.write(key + \'\\t\' + \';\'.join(data[key]) + \'\\n\')
writer.close()
    
        一种Perl的方法:
#!/usr/bin/perl 
use strict;
use warnings;
use Data::Dumper;

open my $fh, \'<\', \'path/to/file\' or die \"unable to open file:$!\";
my %res;
while(<$fh>) {
    my ($k, $v) = split;
    push @{$res{$k}}, $v;
}
print Dumper \\%res;
输出:
$VAR1 = {
      \'c\' => [
               \'bar2\'
             ],
      \'a\' => [
               \'foo\',
               \'bar\',
               \'foo2\'
             ],
      \'b\' => [
               \'bar\'
             ]
    };
    
        
#! /usr/bin/env perl

use strict;
use warnings;

# for demo only
*ARGV = *DATA;

my %record;
my @order;
while (<>) {
  chomp;
  my($key,$combine) = split;

  push @order, $key unless exists $record{$key};
  push @{ $record{$key} }, $combine;
}

print $_, \"\\t\", join(\";\", @{ $record{$_} }), \"\\n\" for @order;

__DATA__
a foo
a bar
a foo2
b bar
c bar2
输出(将选项卡转换为空格,因为堆栈溢出中断了输出): foo; bar; foo2 b吧 c bar2     
        
def compress(infilepath, outfilepath):
    input = open(infilepath, \'r\')
    output = open(outfilepath, \'w\')
    prev_index = None

    for line in input:
        index, val = line.split(\'\\t\')
        if index == prev_index:
            output.write(\";%s\" %val)
        else:
            output.write(\"\\n%s %s\" %(index, val))
    input.close()
    output.close()
未经测试,但应该可以。如有任何疑问,请发表评论     

要回复问题请先登录注册