Welcome to Doom9's Forum, THE in-place to be for everyone interested in DVD conversion.

Before you start posting please read the forum rules. By posting to this forum you agree to abide by the rules.

 

Go Back   Doom9's Forum > Programming and Hacking > Development

Reply
 
Thread Tools Search this Thread Display Modes
Old 27th December 2019, 21:43   #1  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
C++ performance problem

Why is func1 100 times faster than the func2?

Code:
extern "C" {
	__declspec(dllexport) LPVOID __stdcall
	func1()
	{
		CmyInterface* pTest = new CmyInterface();
		LPVOID pMyInterface;
		pTest->QueryInterface(__uuidof(ImyInterface), &pMyInterface);
		return pMyInterface;
	}

        __declspec(dllexport) ImyInterface* __stdcall
        func2()
        {
            return (ImyInterface*) new CmyInterface();
        }
}
It has nothing to do with the server implementation because I tested two independent implementations. One found here:

https://www.codeproject.com/Articles...out-ATL-or-MFC

and other is that:

Code:
#include "pch.h"

#include <ComDef.h>

#include <atomic>


// {864C3D72-B6FD-4014-8AB4-55B664FAEF2C}
static const GUID CLSID_BasicServer =
    { 0x864c3d72, 0xb6fd, 0x4014, { 0x8a, 0xb4, 0x55, 0xb6, 0x64, 0xfa, 0xef, 0x2c } };

// {177D3798-3D6D-4217-B355-D2355884CFDF}
static const GUID IID_IBasicServer =
    { 0x177d3798, 0x3d6d, 0x4217, { 0xb3, 0x55, 0xd2, 0x35, 0x58, 0x84, 0xcf, 0xdf } };


struct IBasicServer : IUnknown
{
    virtual int __stdcall GetFive() = 0;
};


class BasicServer : IBasicServer
{

private:

    std::atomic<int> m_refs = 0;

public:

    // IUnknown

    HRESULT __stdcall QueryInterface(const IID& iid, void** ppv);
    ULONG   __stdcall AddRef();
    ULONG   __stdcall Release();

    // IBasicServer

    int     __stdcall GetFive();
};


// IUnknown

STDMETHODIMP BasicServer::QueryInterface(const IID& iid, void** ppv)
{
    if (!ppv)
        return E_POINTER;

    if (iid == IID_IUnknown)
    {
        *ppv = (IUnknown*)this;
    }
    else if (iid == IID_IBasicServer)
    {
        *ppv = (IBasicServer*)this;
    }
    else
    {
        *ppv = nullptr;
        return E_NOINTERFACE;
    }

    AddRef();
    return S_OK;
}


STDMETHODIMP_(ULONG) BasicServer::AddRef()
{
    return ++m_refs;
}


STDMETHODIMP_(ULONG) BasicServer::Release() {
    int refs = --m_refs;

    if (!refs)
        delete this;

    return refs;
}


// IBasicServer

int __stdcall BasicServer::GetFive()
{
    return 5;
}


// Extern

extern "C" {
    __declspec(dllexport) LPVOID __stdcall
    CreateBasicServer()
    {
        BasicServer* pServer = new BasicServer();
        LPVOID pIBasicServer;
        pServer->QueryInterface(IID_IBasicServer, &pIBasicServer);
        return pIBasicServer;
    }
}
Might there be an easier way to write a simple server, with fewer lines code or is the code fine? It's derived from code found in VapourSynth.

Last edited by stax76; 27th December 2019 at 21:47.
stax76 is offline   Reply With Quote
Old 1st January 2020, 23:07   #2  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by stax76 View Post
Why is func1 100 times faster than the func2?

Code:
extern "C" {
	__declspec(dllexport) LPVOID __stdcall
	func1()
	{
		CmyInterface* pTest = new CmyInterface();
		LPVOID pMyInterface;
		pTest->QueryInterface(__uuidof(ImyInterface), &pMyInterface);
		return pMyInterface;
	}

        __declspec(dllexport) ImyInterface* __stdcall
        func2()
        {
            return (ImyInterface*) new CmyInterface();
        }
}
I see absolutely no reason why "func1" should be faster than "func2". Actually I would expect the opposite to be the case!

That is because both functions first create an instance of "CmyInterface", on the heap (via "new" operator). However, while "func2" simply returns the pointer to the created "CmyInterface" object, "func1" additionally calls QueryInterface() on the created "CmyInterface" object and then returns the pointer that was returned by QueryInterface(). So, clearly, "func1" does everything that "func2" does plus some additional work.

Needless to say that these function also do quite different things: "func2" simply casts the CmyInterface* pointer to an ImyInterface* pointer – which may be legit or not. Keep in mind that C/C++ happily casts a pointer to an incompatible type, if you ask it to do so, even though this would lead to undefined behavior at runtime! Conversely, "func1" returns whatever pointer is returned by CmyInterface::QueryInterface() – which totally depends on the implementation of the CmyInterface::QueryInterface() function. It is quite possible that CmyInterface::QueryInterface() simply returns "(ImyInterface*)this", but it could return a completely different pointer (to some other object) just as well!

BTW: I think you are missing error checking for the QueryInterface() call. Should this call fail, the value of pMyInterface (and thus the return value of "func1") will be undefined.

BTW²: I think "func1" may have a memory leak! Since "func2" simply returns the pointer to the created CmyInterface instance, it is clear that the caller will be responsible for destroying that object when no longer needed. But "func1" never returns the pointer to the CmyInterface instance – it only returns whatever pointer the call to QueryInterface() returned, which could be anything, and does not destroy the created CmyInterface object. So we probably have a leak here.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 1st January 2020 at 23:41.
LoRd_MuldeR is offline   Reply With Quote
Old 1st January 2020, 23:53   #3  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
I noticed this issue by accident in a loop testing for possible memory/resource issues. Maybe it happens only with a managed client, I might try it with a native client as well because I work on a native test prototype. I can try to disassemble it but to reason about it will be difficult for me as I don't understand assembly.

Quote:
However, while "func2" simply returns the pointer to the created "CmyInterface" object
That sounded weird as I thought that after casting to different interfaces it will be a different address for each interface, just tried it in the debugger and it is as you say always the same address.

It's a mystery, I probably should examine this further.
stax76 is offline   Reply With Quote
Old 2nd January 2020, 00:12   #4  |  Link
LoRd_MuldeR
Software Developer
 
LoRd_MuldeR's Avatar
 
Join Date: Jun 2005
Location: Last House on Slunk Street
Posts: 13,248
Quote:
Originally Posted by stax76 View Post
That sounded weird as I thought that after casting to different interfaces it will be a different address for each interface, just tried it in the debugger and it is as you say always the same address.
Keep in mind that in C/C++ pointers are just numbers – the memory address where a certain object is located.

Consequently, doing a simple type-cast on a pointer type does not change the value of the pointer (memory address) at all! All a type-cast from type X* to type Y* really does is, it tells the compiler to assume that the object located at that memory address is of type Y rather than of type X – which can make sense or not. The C/C++ compiler does what you asked for, even if it results in undefined behavior.

On the other hand, which pointer (memory address) is returned by the QueryInterface() function totally depends on the implementation of that function! It is possible that QueryInterface(__uuidof(Y), ...) is implemented simply as "(Y*)this", in which case it would be equivalent to a simple type-cast – but you really can (should) not rely on that. After all, it is just as well possible that p->QueryInterface() returns a pointer (memory address) different from p.
__________________
Go to https://standforukraine.com/ to find legitimate Ukrainian Charities 🇺🇦✊

Last edited by LoRd_MuldeR; 2nd January 2020 at 00:36.
LoRd_MuldeR is offline   Reply With Quote
Old 2nd January 2020, 00:18   #5  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
Yes, I've learned something here, thank you. The reason for the slowdown is probably the dotnet marshaller then, it's still weird because I had tried variants with exact identical function signatures. I let you know if I find out more.

Last edited by stax76; 2nd January 2020 at 00:26.
stax76 is offline   Reply With Quote
Old 2nd January 2020, 11:29   #6  |  Link
nevcairiel
Registered Developer
 
Join Date: Mar 2010
Location: Hamburg/Germany
Posts: 10,344
Quote:
Originally Posted by LoRd_MuldeR View Post
The C/C++ compiler does what you asked for, even if it results in undefined behavior.
To complement this, if you are unsure if your object actually implements an interface, for some reason, you can use a C++ cast instead, ie. dynamic_cast<Y*>(X) instead of (Y*)X, which does type-checking and returns NULL otherwise - as a safety net.
C-style casts in C++ can be dangerous, so always be sure it'll succeed when you use them.

Quote:
Originally Posted by LoRd_MuldeR View Post
On the other hand, which pointer (memory address) is returned by the QueryInterface() function totally depends on the implementation of that function! It is possible that QueryInterface(__uuidof(Y), ...) is implemented simply as "(Y*)this", in which case it would be equivalent to a simple type-cast – but you really can (should) not rely on that. After all, it is just as well possible that p->QueryInterface() returns a pointer (memory address) different from p.
Actually, it should be noted that IUnknown::QueryInterface has strict rules that it must return the same object (https://docs.microsoft.com/en-us/win...queryinterface - Objects must have Identity). If this was violated, then the contract of IUnknown is violated, and your implementation would just be wrong. How badly that may affect any callers in the real world is another question, but you can theoretically rely on this behavior.
__________________
LAV Filters - open source ffmpeg based media splitter and decoders

Last edited by nevcairiel; 2nd January 2020 at 11:40.
nevcairiel is offline   Reply With Quote
Old 2nd January 2020, 14:19   #7  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
In a C++ test app it performs as expected, code:

Code:
#include <Windows.h>
#include <ComDef.h>

#include <iostream>
#include <string>
#include <atomic>
#include <chrono>

using namespace std;


// {864C3D72-B6FD-4014-8AB4-55B664FAEF2C}
static const GUID CLSID_BasicServer =
{ 0x864c3d72, 0xb6fd, 0x4014, { 0x8a, 0xb4, 0x55, 0xb6, 0x64, 0xfa, 0xef, 0x2c } };

// {177D3798-3D6D-4217-B355-D2355884CFDF}
static const GUID IID_IBasicServer =
{ 0x177d3798, 0x3d6d, 0x4217, { 0xb3, 0x55, 0xd2, 0x35, 0x58, 0x84, 0xcf, 0xdf } };


struct IBasicServer : IUnknown
{
    virtual int __stdcall GetFive() = 0;
};


class BasicServer : IBasicServer
{

private:

    std::atomic<int> m_refs = 0;

public:

    // IUnknown

    HRESULT __stdcall QueryInterface(const IID& iid, void** ppv);
    ULONG   __stdcall AddRef();
    ULONG   __stdcall Release();

    // IBasicServer

    int     __stdcall GetFive();
};


// IUnknown

STDMETHODIMP BasicServer::QueryInterface(const IID& iid, void** ppv)
{
    if (!ppv)
        return E_POINTER;

    if (iid == IID_IUnknown)
    {
        *ppv = (IUnknown*)this;
    }
    else if (iid == IID_IBasicServer)
    {
        *ppv = (IBasicServer*)this;
    }
    else
    {
        *ppv = nullptr;
        return E_NOINTERFACE;
    }

    AddRef();
    return S_OK;
}


STDMETHODIMP_(ULONG) BasicServer::AddRef()
{
    return ++m_refs;
}


STDMETHODIMP_(ULONG) BasicServer::Release() {
    int refs = --m_refs;

    if (!refs)
        delete this;

    return refs;
}


// IBasicServer

int __stdcall BasicServer::GetFive()
{
    return 5;
}


// Extern

extern "C" {
    __declspec(dllexport) LPVOID __stdcall
    CreateBasicServer()
    {
        BasicServer* pServer = new BasicServer();
        LPVOID pIBasicServer;

        if (FAILED(pServer->QueryInterface(IID_IBasicServer, &pIBasicServer)))
            return nullptr;

        return pIBasicServer;
    }

    __declspec(dllexport) LPVOID __stdcall
    CreateBasicServerSlow()
    {
        return (IBasicServer*) new BasicServer();
    }
}

int main()
{
    // measure fast CreateBasicServer()
    auto start = std::chrono::high_resolution_clock::now();

    for (size_t i = 0; i < 1'000'000; i++)
        ((IUnknown*)CreateBasicServer())->Release();

    auto end = std::chrono::high_resolution_clock::now();

    cout << "fast: " << (end - start).count() << endl;


    // measure slow CreateBasicServerSlow()
    auto start2 = std::chrono::high_resolution_clock::now();

    for (size_t i = 0; i < 1'000'000; i++)
        ((IUnknown*)CreateBasicServerSlow())->Release();

    auto end2 = std::chrono::high_resolution_clock::now();

    cout << "slow: " << (end2 - start2).count() << endl;

    cout << '\n' << "Press key to exit" << endl;
    cin.get();
}
If I create a C++ DLL and test it in a C# app it's 350 times slower!

Code:
#include "pch.h"

#include <ComDef.h>

#include <atomic>


// {864C3D72-B6FD-4014-8AB4-55B664FAEF2C}
static const GUID CLSID_BasicServer =
    { 0x864c3d72, 0xb6fd, 0x4014, { 0x8a, 0xb4, 0x55, 0xb6, 0x64, 0xfa, 0xef, 0x2c } };

// {177D3798-3D6D-4217-B355-D2355884CFDF}
static const GUID IID_IBasicServer =
    { 0x177d3798, 0x3d6d, 0x4217, { 0xb3, 0x55, 0xd2, 0x35, 0x58, 0x84, 0xcf, 0xdf } };


struct IBasicServer : IUnknown
{
    virtual int __stdcall GetFive() = 0;
};


class BasicServer : IBasicServer
{

private:

    std::atomic<int> m_refs = 0;

public:

    // IUnknown

    HRESULT __stdcall QueryInterface(const IID& iid, void** ppv);
    ULONG   __stdcall AddRef();
    ULONG   __stdcall Release();

    // IBasicServer

    int     __stdcall GetFive();
};


// IUnknown

STDMETHODIMP BasicServer::QueryInterface(const IID& iid, void** ppv)
{
    if (!ppv)
        return E_POINTER;

    if (iid == IID_IUnknown)
    {
        *ppv = (IUnknown*)this;
    }
    else if (iid == IID_IBasicServer)
    {
        *ppv = (IBasicServer*)this;
    }
    else
    {
        *ppv = nullptr;
        return E_NOINTERFACE;
    }

    AddRef();
    return S_OK;
}


STDMETHODIMP_(ULONG) BasicServer::AddRef()
{
    return ++m_refs;
}


STDMETHODIMP_(ULONG) BasicServer::Release() {
    int refs = --m_refs;

    if (!refs)
        delete this;

    return refs;
}


// IBasicServer

int __stdcall BasicServer::GetFive()
{
    return 5;
}


// Extern

extern "C" {
    __declspec(dllexport) LPVOID __stdcall
    CreateBasicServer()
    {
        BasicServer* pServer = new BasicServer();
        LPVOID pIBasicServer;
        pServer->QueryInterface(IID_IBasicServer, &pIBasicServer);
        return pIBasicServer;
    }

    __declspec(dllexport) LPVOID __stdcall
    CreateBasicServerSlow()
    {
        return (IBasicServer*) new BasicServer();
    }
}
Code:
using System;
using System.Diagnostics;
using System.Runtime.InteropServices;

class Program
{
    static void Main(string[] args)
    {
        // test fast
        Stopwatch sw = new Stopwatch();
        sw.Start();

        for (int i = 0; i < 1000; i++)
            Marshal.ReleaseComObject(CreateBasicServer());

        sw.Stop();
        Console.WriteLine("fast: " + sw.ElapsedMilliseconds);


        // test slow
        sw.Reset();
        sw.Start();

        for (int i = 0; i < 1000; i++)
            Marshal.ReleaseComObject(CreateBasicServerSlow());

        sw.Stop();
        Console.WriteLine("slow: " + sw.ElapsedMilliseconds);

        Console.ReadKey();
    }

    [DllImport(@"D:\Projekte\CPP\FrameServer\x64\Debug\FrameServer.dll")]
    static extern IBasicServer CreateBasicServer();

    [DllImport(@"D:\Projekte\CPP\FrameServer\x64\Debug\FrameServer.dll")]
    static extern IBasicServer CreateBasicServerSlow();
}

[Guid("177D3798-3D6D-4217-B355-D2355884CFDF")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
public interface IBasicServer
{
    [PreserveSig]
    int GetFive();
}
result:

fast: 8
slow: 2815

Last edited by stax76; 2nd January 2020 at 14:22.
stax76 is offline   Reply With Quote
Old 2nd January 2020, 14:38   #8  |  Link
stax76
Registered User
 
stax76's Avatar
 
Join Date: Jun 2002
Location: On thin ice
Posts: 6,837
Instead of the .NET Framework CLR I tried the much more advanced .NET Core CLR and that doesn't run the slow function but throws:

System.ExecutionEngineException: 'Exception of type 'System.ExecutionEngineException' was thrown.'


edit:

with native code debugging enabled:

Exception thrown at 0x00007FF80C2C2EE8 (coreclr.dll) in cs console.exe: 0xC0000005: Access violation reading location 0xFFFFFFFFFFFFFFFF.


edit2:

difference is the fast variant calls AddRef in the QueryInterface method.


edit3:

the solution:

Code:
    __declspec(dllexport) LPVOID __stdcall
    CreateBasicServerSlow()
    {
        auto value = new BasicServer();
        value->AddRef();
        return value;
    }

Last edited by stax76; 2nd January 2020 at 15:02.
stax76 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:05.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2024, vBulletin Solutions Inc.