Opened 23 months ago

Last modified 21 months ago

#2382 assigned task

Improve regex in String2XMLSerializer

Reported by: ascheibe Owned by: epitzer
Priority: medium Milestone: HeuristicLab 3.3.x Backlog
Component: General Version: 3.3.11
Keywords: Cc:

Description

I ran into the limitation that arrays are not allowed to be larger than 2GB in total size. I propose to enable support for larger arrays in .NET. This is a new option that came with .NET 4.5 and can be enabled by putting <runtime>

<gcAllowVeryLargeObjects enabled="true" />

</runtime> into the app.config. See https://msdn.microsoft.com/en-us/library/hh285054%28v=vs.110%29.aspx for more details.

Change History (4)

comment:1 Changed 23 months ago by ascheibe

  • Owner changed from ascheibe to architects
  • Status changed from new to assigned

comment:2 Changed 23 months ago by ascheibe

Example of such an exception:

HeuristicLab.Clients.Hive.SlaveCore.TaskFailedException: Task failed with reason: HeuristicLab.Persistence.Core.PersistenceException: Unexpected exception while trying to parse object of type "System.String, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089". ---> System.OutOfMemoryException: Array dimensions exceeded supported range.
   at System.Text.RegularExpressions.RegexRunner.DoubleTrack()
   at System.Text.RegularExpressions.RegexInterpreter.Goto(Int32 newpos)
   at System.Text.RegularExpressions.RegexInterpreter.Go()
   at System.Text.RegularExpressions.RegexRunner.Scan(Regex regex, String text, Int32 textbeg, Int32 textend, Int32 textstart, Int32 prevlen, Boolean quick, TimeSpan timeout)
   at System.Text.RegularExpressions.Regex.Run(Boolean quick, Int32 prevlen, String input, Int32 beginning, Int32 length, Int32 startat)
   at System.Text.RegularExpressions.MatchCollection.GetMatch(Int32 i)
   at System.Text.RegularExpressions.MatchEnumerator.MoveNext()
   at HeuristicLab.Persistence.Default.Xml.Primitive.String2XmlSerializer.Parse(XmlString x)
   at HeuristicLab.Persistence.Interfaces.PrimitiveSerializerBase`2.HeuristicLab.Persistence.Interfaces.IPrimitiveSerializer.Parse(ISerialData data)
   at HeuristicLab.Persistence.Core.Deserializer.PrimitiveHandler(PrimitiveToken token)
   --- End of inner exception stack trace ---
   at HeuristicLab.Persistence.Core.Deserializer.PrimitiveHandler(PrimitiveToken token)
   at HeuristicLab.Persistence.Core.Deserializer.Deserialize(IEnumerable`1 tokens)
   at HeuristicLab.Persistence.Default.Xml.XmlParser.Deserialize(Stream stream)
   at HeuristicLab.Persistence.Default.Xml.XmlParser.Deserialize[T](Stream stream)
   at HeuristicLab.Clients.Hive.PersistenceUtil.Deserialize[T](Byte[] sjob)
   at HeuristicLab.Clients.Hive.SlaveCore.Executor.Start(Byte[] serializedJob)

comment:3 Changed 23 months ago by ascheibe

  • Owner changed from architects to epitzer
  • Summary changed from Add support for large arrays to Improve regex in String2XMLSerializer

There is a regex in String2XMLSerialilzer that maybe could be improved to reduce memory consumption:

private static readonly Regex re = new Regex(@"<!\[CDATA\[((?:[^]]|\](?!\]>))*)\]\]>|<Base64>([<]*)</Base64>", RegexOptions.Singleline);

Could you have a look at it? It may be a problem with backtracking?

comment:4 Changed 21 months ago by ascheibe

  • Milestone changed from HeuristicLab 3.3.12 to HeuristicLab 3.3.x Backlog

I discussed this with epitzer. The regex seems to be ok so the reason for this problem is the big hive job. As the gcAllowVeryLargeObjects flag is not an option there is no quick way to fix this at the moment.

Note: See TracTickets for help on using tickets.