Files
@ r13257:4c5b8120be59
Branch filter:
Location: cpp/openttd-patchpack/source/src/core/alloc_func.hpp
r13257:4c5b8120be59
4.0 KiB
text/x-c++hdr
(svn r17776) -Codechange: [SDL] make "update the video card"-process asynchronious. Profiling with gprof etc. hasn't shown us that DrawSurfaceToScreen takes a significant amount of CPU; only using TIC/TOC it became apparant that it was a heavy CPU-cycle user or that it was waiting for something.
The benefit of making this function asynchronious ranges from 2%-25% (real time) during fast forward on dual core/hyperthreading-enabled CPUs; 8bpp improvements are, in my test cases, significantly smaller than 32bpp improvements.
On single core non-hyperthreading-enabled CPUs the extra locking/scheduling costs up to 1% extra realtime in fast forward. You can use -v sdl:no_threads to disable threading and undo this loss.
During normal non-fast-forwarded games the benefit/costs are negligable except when the gameloop takes more than about 90% of the time of a tick.
Note that allegro's performance does not improve with this system, likely due to their way of getting data to the video card. It is not implemented for the OS X/Windows video backends, unless (ofcourse) SDL is used there.
Funny is that the performance of the 32bpp(-anim) blitter is, at least in some test cases, significantly faster (more than 10%) than the 8bpp(-optimized) blitter when looking at real time in fast forward on a dual core CPU; it was slower.
The idea comes from a paper/report by Idar Borlaug and Knut Imar Hagen.
The benefit of making this function asynchronious ranges from 2%-25% (real time) during fast forward on dual core/hyperthreading-enabled CPUs; 8bpp improvements are, in my test cases, significantly smaller than 32bpp improvements.
On single core non-hyperthreading-enabled CPUs the extra locking/scheduling costs up to 1% extra realtime in fast forward. You can use -v sdl:no_threads to disable threading and undo this loss.
During normal non-fast-forwarded games the benefit/costs are negligable except when the gameloop takes more than about 90% of the time of a tick.
Note that allegro's performance does not improve with this system, likely due to their way of getting data to the video card. It is not implemented for the OS X/Windows video backends, unless (ofcourse) SDL is used there.
Funny is that the performance of the 32bpp(-anim) blitter is, at least in some test cases, significantly faster (more than 10%) than the 8bpp(-optimized) blitter when looking at real time in fast forward on a dual core CPU; it was slower.
The idea comes from a paper/report by Idar Borlaug and Knut Imar Hagen.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 | /* $Id$ */
/*
* This file is part of OpenTTD.
* OpenTTD is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 2.
* OpenTTD is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
* See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with OpenTTD. If not, see <http://www.gnu.org/licenses/>.
*/
/** @file alloc_func.hpp Functions related to the allocation of memory */
#ifndef ALLOC_FUNC_HPP
#define ALLOC_FUNC_HPP
/**
* Functions to exit badly with an error message.
* It has to be linked so the error messages are not
* duplicated in each object file making the final
* binary needlessly large.
*/
void NORETURN MallocError(size_t size);
void NORETURN ReallocError(size_t size);
/**
* Simplified allocation function that allocates the specified number of
* elements of the given type. It also explicitly casts it to the requested
* type.
* @note throws an error when there is no memory anymore.
* @note the memory contains garbage data (i.e. possibly non-zero values).
* @tparam T the type of the variable(s) to allocation.
* @param num_elements the number of elements to allocate of the given type.
* @return NULL when num_elements == 0, non-NULL otherwise.
*/
template <typename T>
static FORCEINLINE T *MallocT(size_t num_elements)
{
/*
* MorphOS cannot handle 0 elements allocations, or rather that always
* returns NULL. So we do that for *all* allocations, thus causing it
* to behave the same on all OSes.
*/
if (num_elements == 0) return NULL;
T *t_ptr = (T*)malloc(num_elements * sizeof(T));
if (t_ptr == NULL) MallocError(num_elements * sizeof(T));
return t_ptr;
}
/**
* Simplified allocation function that allocates the specified number of
* elements of the given type. It also explicitly casts it to the requested
* type.
* @note throws an error when there is no memory anymore.
* @note the memory contains all zero values.
* @tparam T the type of the variable(s) to allocation.
* @param num_elements the number of elements to allocate of the given type.
* @return NULL when num_elements == 0, non-NULL otherwise.
*/
template <typename T>
static FORCEINLINE T *CallocT(size_t num_elements)
{
/*
* MorphOS cannot handle 0 elements allocations, or rather that always
* returns NULL. So we do that for *all* allocations, thus causing it
* to behave the same on all OSes.
*/
if (num_elements == 0) return NULL;
T *t_ptr = (T*)calloc(num_elements, sizeof(T));
if (t_ptr == NULL) MallocError(num_elements * sizeof(T));
return t_ptr;
}
/**
* Simplified reallocation function that allocates the specified number of
* elements of the given type. It also explicitly casts it to the requested
* type. It extends/shrinks the memory allocation given in t_ptr.
* @note throws an error when there is no memory anymore.
* @note the pointer to the data may change, but the data will remain valid.
* @tparam T the type of the variable(s) to allocation.
* @param t_ptr the previous allocation to extend/shrink.
* @param num_elements the number of elements to allocate of the given type.
* @return NULL when num_elements == 0, non-NULL otherwise.
*/
template <typename T>
static FORCEINLINE T *ReallocT(T *t_ptr, size_t num_elements)
{
/*
* MorphOS cannot handle 0 elements allocations, or rather that always
* returns NULL. So we do that for *all* allocations, thus causing it
* to behave the same on all OSes.
*/
if (num_elements == 0) {
free(t_ptr);
return NULL;
}
t_ptr = (T*)realloc(t_ptr, num_elements * sizeof(T));
if (t_ptr == NULL) ReallocError(num_elements * sizeof(T));
return t_ptr;
}
/** alloca() has to be called in the parent function, so define AllocaM() as a macro */
#define AllocaM(T, num_elements) ((T*)alloca((num_elements) * sizeof(T)))
#endif /* ALLOC_FUNC_HPP */
|