Skip to content

matchRegexAll stops parsing when encountering a UTF character that is 0 modulo 256. #181

Open
@hugodro

Description

@hugodro

When parsing a string (unicode) that contains '一' or '开', the matchRegexAll stopped looking for the pattern and terminates.
Did minimum testing and found in the samples that the characters that were of 0xNN00 were all causing the problem, thus the conclusion that a character that is 0 modulo 256 is the issue.

Example:

let
    targetPattern = "\\[needle:([^]]*)\\]"
   aString = initString
  in
    case Rgx.matchRegexAll (Rgx.mkRegex mediaPattern) aString of
      Nothing -> []
      Just (before, needle, after, values) -> etc...

That doesn't work if initString = "一杯奢华威士忌。[needle:some text]" or "开拓的精神进。[needle:more text]", but it does when adding the following piece of code:

aString = map (\c -> if mod (DC.ord c) 256 == 0 then ' ' else c) initString
in the preambule (DC is Data.Char).

Encountered with regex-base 0.94.0.1, regex-compat 0.95.2.1, regex-posix 0.96.0.0 installed, on ghc version 8.6.5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions