R:替换双重转义文本

我正在使用Amazon Elastic Map Reduce命令行工具将许多系统调用粘在一起。这些命令返回已经(部分?)转义的JSON文本。然后,当系统调用将其转换为R文本对象(实习生= T)时,它似乎再次被转义。我需要清理它,以便它将使用rjson包进行解析。 我这样做系统调用:
system("~/EMR/elastic-mapreduce --describe --jobflow j-2H9P770Z4B8GG", intern=T)
返回:
 [1] "{"                                                                                             
 [2] "  "JobFlows": ["                                                                             
 [3] "    {"                                                                                         
 [4] "      "LogUri": "s3n:\/\/emrlogs\/","                                                   
 [5] "      "Name": "emrFromR","                                                                 
 [6] "      "BootstrapActions": [" 
...
但命令行中的相同命令返回:
{
  "JobFlows": [
    {
      "LogUri": "s3n://emrlogs/",
      "Name": "emrFromR",
      "BootstrapActions": [
        {
          "BootstrapActionConfig": {
...
如果我尝试通过rjson运行系统调用的结果,我收到此错误:
Error: '/' is an unrecognized escape in character string starting "s3n:/"
我相信这是因为s3n系列的双重逃逸。我正在努力将这个文本按摩到可以解析的东西。 它可能就像用“”替换“\”一样简单,但由于我有点与正则表达式和逃避斗争,我无法正确完成。 那么如何获取字符串向量并用“”替换任何出现的“\”? (即使输入这个问题我也不得不使用三个反斜杠代表两个)这个特定用例的其他任何提示? 这是我的代码更详细:
> library(rjson)
> emrJson <- paste(system("~/EMR/elastic-mapreduce --describe --jobflow j-2H9P770Z4B8GG", intern=T))
> 
>     parser <- newJSONParser()
>     for (i in 1:length(emrJson)){
+       parser$addData(emrJson[i])
+     }
> 
> parser$getObject()
Error: '/' is an unrecognized escape in character string starting "s3n:/"
如果你想重新创建emrJson对象,这里是dput()输出:
> dput(emrJson)
c("{", "  "JobFlows": [", "    {", "      "LogUri": "s3n:\/\/emrlogs\/",", 
"      "Name": "emrFromR",", "      "BootstrapActions": [", 
"        {", "          "BootstrapActionConfig": {", "            "Name": "Bootstrap 0",", 
"            "ScriptBootstrapAction": {", "              "Path": "s3:\/\/rtmpfwblrx\/bootstrap.sh",", 
"              "Args": []", "            }", "          }", 
"        }", "      ],", "      "ExecutionStatusDetail": {", 
"        "EndDateTime": 1278124414.0,", "        "CreationDateTime": 1278123795.0,", 
"        "LastStateChangeReason": "Steps completed",", "        "State": "COMPLETED",", 
"        "StartDateTime": 1278124000.0,", "        "ReadyDateTime": 1278124237.0", 
"      },", "      "Steps": [", "        {", "          "StepConfig": {", 
"            "ActionOnFailure": "CANCEL_AND_WAIT",", "            "Name": "Example Streaming Step",", 
"            "HadoopJarStep": {", "              "MainClass": null,", 
"              "Jar": "\/home\/hadoop\/contrib\/streaming\/hadoop-0.18-streaming.jar",", 
"              "Args": [", "                "-input",", "                "s3n:\/\/rtmpfwblrx\/stream.txt",", 
"                "-output",", "                "s3n:\/\/rtmpfwblrxout\/",", 
"                "-mapper",", "                "s3n:\/\/rtmpfwblrx\/mapper.R",", 
"                "-reducer",", "                "cat",", 
"                "-cacheFile",", "                "s3n:\/\/rtmpfwblrx\/emrData.RData#emrData.RData"", 
"              ],", "              "Properties": []", "            }", 
"          },", "          "ExecutionStatusDetail": {", "            "EndDateTime": 1278124322.0,", 
"            "CreationDateTime": 1278123795.0,", "            "LastStateChangeReason": null,", 
"            "State": "COMPLETED",", "            "StartDateTime": 1278124232.0", 
"          }", "        }", "      ],", "      "JobFlowId": "j-2H9P770Z4B8GG",", 
"      "Instances": {", "        "Ec2KeyName": "JL 09282009",", 
"        "InstanceCount": 2,", "        "Placement": {", 
"          "AvailabilityZone": "us-east-1d"", "        },", 
"        "KeepJobFlowAliveWhenNoSteps": false,", "        "SlaveInstanceType": "m1.small",", 
"        "MasterInstanceType": "m1.small",", "        "MasterPublicDnsName": "ec2-174-129-70-89.compute-1.amazonaws.com",", 
"        "MasterInstanceId": "i-2147b84b",", "        "InstanceGroups": null,", 
"        "HadoopVersion": "0.18"", "      }", "    }", "  ]", 
"}")
    
已邀请:
一般规则似乎是使用您认为需要的反斜杠数量的两倍(现在找不到源)。
emrJson <- gsub("\\", "\", emrJson)
parser <- newJSONParser()
for (i in 1:length(emrJson)){
    parser$addData(emrJson[i])
}
parser$getObject()
在这里使用你的输出输出。     
我不确定它是否被双重逃脱。请记住,您需要使用'cat'来查看字符串是什么,而不是字符串的表示形式。     

要回复问题请先登录注册