Haskell FFI / C的性能考虑？

| 如果将Haskell用作从我的C程序中调用的库，则对其进行调用会对性能产生什么影响？例如，如果我有一个问题世界数据集，比如说20kB的数据，而我想运行以下内容：

// Go through my 1000 actors and have them make a decision based on
// HaskellCode() function, which is compiled Haskell I\'m accessing through
// the FFI.  As an argument, send in the SAME 20kB of data to EACH of these
// function calls, and some actor specific data
// The 20kB constant data defines the environment and the actor specific
// data could be their personality or state
for(i = 0; i < 1000; i++)
   actor[i].decision = HaskellCode(20kB of data here, actor[i].personality);

这将要发生什么-我有可能将20kB的数据保留为Haskell代码访问的全局不可变引用吗？还是我每次必须创建该数据的副本？令人担忧的是，这些数据可能更大，更大-我还希望编写算法，以使用多次调用Haskell代码所使用的不变数据的相同模式，对更大的数据集起作用。另外，我想对此进行并行化，例如dispatch_apply（）GCD或Parallel.ForEach（..）C＃。我在Haskell之外进行并行化的基本原理是，我知道我将始终在许多单独的函数调用（即1000个actor）上进行操作，因此在Haskell函数中使用细粒度的并行化并不比在C级别对其进行管理更好。是否正在运行FFI Haskell实例“线程安全”，以及如何实现这一点-每次启动并行运行时都需要初始化Haskell实例吗？（如果需要的话，似乎太慢了。）如何实现良好的性能？

已邀请:

4 个回复

剃摧庭峨僳

对其进行呼叫会对性能产生什么影响假设您仅在我的机器上启动一次Haskell运行时，就这样，从C到Haskell进行函数调用，将Int跨边界传递，大约需要80,000个周期（Core 2上为31,000 ns）- -通过rdstc寄存器实验确定我是否有可能将那20kB的数据保留为Haskell代码访问的全局不可变引用是的，那肯定是可能的。如果数据确实是不可变的，那么无论您是否：通过封送处理将数据跨语言边界来回穿线；来回传递对数据的引用；或将其缓存在Haskell端的ѭ1中。哪种策略最好？这取决于数据类型。最惯用的方式是来回传递对C数据的引用，在Haskell端将其视为ByteString或Vector。我想并行化我强烈建议您然后反转控件，并从Haskell运行时进行并行化-由于该路径已经过大量测试，因此它将更加健壮。关于线程安全，对在同一运行时中运行的foreign exported函数进行并行调用显然是安全的-尽管可以肯定没有人尝试过这样做以获得并行性。调用获得一种功能，该功能本质上是一种锁定，因此多个调用可能会阻塞，从而减少了并行的机会。在多核情况下（例如，大约5英镑），您的结果可能会有所不同（可以使用多种功能），但是，这几乎肯定是提高性能的一种不好的方法。同样，通过forkIO从Haskell进行许多并行函数调用是一个更有据可查，经过更好测试的路径，比在C端进行工作所需的开销更少，并且最终的代码可能更少。只需调用您的Haskell函数，这将通过许多Haskell线程进行并行处理。简单！

撵穆

我在一个应用程序中混合使用了C和Haskell线程，却没有注意到在这两者之间切换会对性能造成很大影响。因此，我制定了一个简单的基准测试...比Don \'s更快/更便宜。这正在2.66GHz i7上测量1000万次迭代：

$ ./foo
IO  : 2381952795 nanoseconds total, 238.195279 nanoseconds per, 160000000 value
Pure: 2188546976 nanoseconds total, 218.854698 nanoseconds per, 160000000 value

在OSX 10.6上与GHC 7.0.3 / x86_64和gcc-4.2.1一起编译

ghc -no-hs-main -lstdc++ -O2 -optc-O2 -o foo ForeignExportCost.hs Driver.cpp

Haskell：

{-# LANGUAGE ForeignFunctionInterface #-}

module ForeignExportCost where

import Foreign.C.Types

foreign export ccall simpleFunction :: CInt -> CInt
simpleFunction i = i * i

foreign export ccall simpleFunctionIO :: CInt -> IO CInt
simpleFunctionIO i = return (i * i)

而要驱动它的OSX C ++应用程序，应该很容易适应Windows或Linux：

#include <stdio.h>
#include <mach/mach_time.h>
#include <mach/kern_return.h>
#include <HsFFI.h>
#include \"ForeignExportCost_stub.h\"

static const int s_loop = 10000000;

int main(int argc, char** argv) {
    hs_init(&argc, &argv);

    struct mach_timebase_info timebase_info = { };
    kern_return_t err;
    err = mach_timebase_info(&timebase_info);
    if (err != KERN_SUCCESS) {
        fprintf(stderr, \"error: %x\\n\", err);
        return err;
    }

    // timing a function in IO
    uint64_t start = mach_absolute_time();
    HsInt32 val = 0;
    for (int i = 0; i < s_loop; ++i) {
        val += simpleFunctionIO(4);
    }

    // in nanoseconds per http://developer.apple.com/library/mac/#qa/qa1398/_index.html
    uint64_t duration = (mach_absolute_time() - start) * timebase_info.numer / timebase_info.denom;
    double duration_per = static_cast<double>(duration) / s_loop;
    printf(\"IO  : %lld nanoseconds total, %f nanoseconds per, %d value\\n\", duration, duration_per, val);

    // run the loop again with a pure function
    start = mach_absolute_time();
    val = 0;
    for (int i = 0; i < s_loop; ++i) {
        val += simpleFunction(4);
    }

    duration = (mach_absolute_time() - start) * timebase_info.numer / timebase_info.denom;
    duration_per = static_cast<double>(duration) / s_loop;
    printf(\"Pure: %lld nanoseconds total, %f nanoseconds per, %d value\\n\", duration, duration_per, val);

    hs_exit();
}

古擅坛犯

如果您传递指针，Haskell可以窥视该20k斑点。

死搭胯

免责声明：我没有FFI的经验。但是在我看来，如果您想重复使用20 Kb的数据，这样就不会每次都传递数据，那么您可以简单地使用一个方法，该方法使用一个“个性”列表，并返回一个\“决定\”。所以如果你有一个功能

f :: LotsaData -> Personality -> Decision
f data p = ...

那为什么不做一个助手功能

helper :: LotsaData -> [Personality] -> [Decision]
helper data ps = map (f data) ps

并调用它？但是，使用这种方法，如果要进行并行化，则需要在Haskell端使用并行列表和并行映射进行此操作。我请专家解释是否可以轻松地将C数组编组到Haskell列表（或类似结构）中。

要回复问题请先登录或注册

Haskell FFI / C的性能考虑？

4 个回复

发起人

haskell

parallel_processing

performance

ffi

问题状态

Haskell FFI / C的性能考虑？

与内容相关的链接

4 个回复

发起人

haskell

parallel_processing

performance

ffi

问题状态