Opened 2 years ago

Closed 23 months ago

#2386 closed defect (done)

Poor hash function affects performance in the Parameter-less Population Pyramid

Reported by: bburlacu Owned by: mkommend
Priority: medium Milestone: HeuristicLab 3.3.12
Component: Algorithms.ParameterlessPopulationPyramid Version: 3.3.11
Keywords: Cc: bgoldman

Description

The C# implementation of PPP uses a HashSet<BinaryVector> to keep track of the set of solutions in the pyramid, which depends on the EnumerableBoolEqualityComparer that implements Equals and GetHashCode.

However, the GetHashCode implementation in EnumerableBoolEqualityComparer produces collisions, leading to uneven buckets in the hashset and to more calls to the Equals method which is obviously much slower than a hash check. The collisions unit test shows a collision rate of aprox. 3.5% for the original GetHashCode implementation.

Replacing the code with the following simple implementation reduces the collision rate to zero in the unit test and greatly improves runtime performance (6 seconds versus 16 seconds for the OneMax problem).

public int GetHashCode(IEnumerable<bool> obj) {
  unchecked {
    int hash = 17;
    foreach (var bit in obj) {
      hash = hash * 29 + (bit ? 1231 : 1237);
    }
    return hash;
  }
}

Change History (4)

comment:1 Changed 2 years ago by bburlacu

  • Status changed from new to accepted

comment:2 Changed 2 years ago by bburlacu

  • Owner changed from bburlacu to mkommend
  • Status changed from accepted to reviewing

r12392: Updated GetHashCode method in the EnumerableBoolEqualityComparer.

comment:3 Changed 2 years ago by mkommend

  • Status changed from reviewing to readytorelease

comment:4 Changed 23 months ago by mkommend

  • Resolution set to done
  • Status changed from readytorelease to closed

r12704: Merged r12392 into stable.

Note: See TracTickets for help on using tickets.