Link speed in C ++

I was working on a project and was trying to find the source of the big runtime slowdown and narrowed it down to one method that I was able to optimize from logic. The problem is that my solution involves using a link which makes another section of the code run quite slowly ... The question I would like to answer is why the inner loop takes so much longer to evaluate when map is a reference and not a local variable?

Here's the old way before optimization:

// old method: create an empty map, populate it
// and then assign it back to the path object later
map<int,float> screenline_usage; 

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here. 
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~12 seconds
}

// This function call is evaluated 400k times for an overall execution time of ~126 seconds
path->set_zone_screenline_usage(access_mode, zone_id, screenline_usage); 

// TOTAL EXECUTION TIME: 138 seconds. 

      

New way after optimization:

// new method: get a reference to internal path mapping and populate it
map<int, float>& screenline_usage =
  path->get_zone_screenline_usage(access_mode, zone_id);
screenline_usage.clear();

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[it->first] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~76 seconds
}

// New method... no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: 76 seconds (62 second time saving) ... but should be able to do it in just 12 seconds if the use of reference didn't add so much time :(

      

Here are the relevant subroutines called from this code:

// This is the really slow routine, due to the copy assignment used. 
void set_zone_screenline_usage(int access_mode, int zone_id,
  map<int,float>& screenline_usage)
{
  m_container[access_mode][zone_id] = screenline_usage; 
}

map<int,float>& get_zone_screenline_usage(int access_mode, int zone_id)
{
  return m_container[access_mode][zone_id]; 
}

      

NOTES: Synchronization information is for one run, which evaluates the above code approximately 400k times. The timing is done using some of the classes I created to access the RDTSC timestamp counter (yes, I know TSC stands for timestamp counter), the average numCandidates is 10, and the average number of items pushed onto the screenline_usage map is 25.


UPDATE: First, thanks to everyone who contributed here. I think that in the end this had no C ++ references at all and was more about cache consistency. I replaced the optimized code above with the vector & and hash function implemented as a member map variable

// newest method: get a reference to internal path mapping (as vector) and populate it 
// map<int,int> m_linkNum_to_SlNum declared in header and populated in constructor. 
vector<float>& screenline_usage = path->get_zone_screenline_usage(access_mode, zone_id);

for (int i=0; i<numCandidates; ++i)
{
  // timing starts here
  map<int, float>& my_screenline_usage =
    path->get_combined_screenline_usage(legnr, stop_id);
  map<int, float>::iterator it = my_screenline_usage.begin(); 
  for (; it != my_screenline_usage.end(); ++it) 
    screenline_usage[m_linkNum_to_SlNum[it->first]] += usage * it->second; 
  // timing ends here, this block evaluated 4 million times for overall execution time of ~9 seconds
}

// Newest method... again no need to assign back to path object (0 seconds execution :)
// TOTAL EXECUTION TIME: just 9 seconds (129 second time saving) ... this is even better than using a locally constructed map which took 12 seconds in the inner loop :)

      

It seems to me that assuming the vector is not local but a contiguous block of memory and that the hash function (m_linkNum_to_SlNum) is a local member variable, this approach results in code / data that can be inserted into the cache without having to go out to main memory for data, resulting in significant speed. Other findings from these findings were praised.

+1


a source to share


6 answers


As with my updates, I think this is most likely a cache consistency issue, not a C ++ reference issue.



0


a source


Perhaps you, the C ++ compiler, can inline some code for a local map, but not when the map is a reference.



+4


a source


Your question is incomprehensible. Are you wondering why passing a map by reference is faster than passing by value? If so, the answer is simple: Returning a map by value means copying the entire map, and copying a large map is very expensive.

On the other hand, if your question is why it is faster to get a link to an existing card and fill it in than to create a new card, then one hypothesis is that it is related to

 screenline_usage[it->first] += usage * it->second; 

      

Since the [it-> first] key already exists on the map inside path-> get_zone_screenline_usage, this is just an update operation and does not require memory allocation. However, if screenline_usage is an empty map, each access to a new [it-> first] means that you have to allocate a new map node from the heap first and this is expensive.

+3


a source


If I understand your question correctly, you are implying that inserting into a local stack-allocated map <> is much faster than inserting into an existing heap-allocated map <> that you retrieve by reference.

There are several possibilities that can affect performance. I doubt it has anything to do with the C ++ benchmark performance.

The first possibility is that by changing screenline_usage

to a reference, you "essentially" retrieve a pointer to an existing object. The actual instance of the object may not be map<int,float>

, it can be anything that inherits from the map. For example, it could be a card with a specific comparator function defined for it. Or a subclass with thread synchronization logic. Since you don't know what type you are getting from the call to m_container[access_mode][zone_id]

, you may well end up with a subclass that does not perform well on insertion. (You can see what type you return in the debugger, by the way.) Whereas by creating empty map<int,float>

, you have a guarantee that the type is the actual map <> and not a subclass.

Suppose you are returning an actual map <> instance back.

The second possibility is that in the particular STL style you are using, the function map::clear()

does not effectively clean up the internal data structures used to maintain associative indexes. As far as I remember, stl :: map <> uses some complex internal hashing and bucket structures to provide efficient insertion and retrieval capabilities. However, it is unclear what happens to them when you call clear (). Thus, it is possible that screenline_usage.clear()

results in a less efficient card to insert against than if you started with a blank card.

Finally, let's say that I am wrong and this is neither of these possibilities. There is an easy way to determine if the difference is the result of using a reference variable. You can try to allocate a new map on the heap and assign it to the link in your code, for example:

map<int, float>& screenline_usage = new map<int,float>();

      

This will help you determine if there is a difference in adding to an existing map or a new map, or if it is indeed the fact that screenline_usage is a link that affects performance. BTW, Don't forget to free this heap allocated card or you will lose your memory leak.

+2


a source


Links are most often behind the scenes, implemented as pointers. An exception to this would be if they are assigned a temporary, then the lifetime of the value increases to the lifetime of the reference; in fact, a link is an object. So it depends if:

get_combined_screenline_usage

returns by reference or by value. If it's a link, the reference way might be faster. If it's a value, then the old way and the new way do pretty much the same thing, assuming the compiler is optimizing the return value.

In any case, other optimizations that the compiler makes (such as inlining) may mask the effects of the two options; effectively, not knowing exactly which line your slow part is on, this makes this a bit of a guessing game.

I suggest trying to get finer grain information and narrow it down to the point where it takes so long that it makes optimization easier.

(as a side note:

map<int, float>::iterator it = my_screenline_usage.begin(); 
for (; it != my_screenline_usage.end(); ++it) 

      

might be more efficient if written like

for (map<int, float>::const_iterator it(my_screenline_usage.begin()), end(my_screenline_usage.begin()); it != end; ++it)

      

)

+1


a source


The problem I'm trying to figure out is why the inner loop takes so much longer to evaluate when the map is a link?

I guess for a long time is a call get_zone_screenline_usage

: and the reason it takes a long time is because the particular element doesn't exist in yet m_container

, so it needs to be created and inserted before it can be returned.

0


a source







All Articles