Assert.AreEqual fails even though expected and actual are the same
I have the following test that seems to generate the same lines, but Assert.AreEqual is failing.
[TestMethod]
public void Decompressed_test_should_equal_to_text_before_compression()
{
TextCompressor compressor = new TextCompressor();
Random r = new Random((int)DateTime.Now.Ticks);
for (int i = 500; i < 1500; i++)
{
char[] testArray = new char[i];
for (int j = 0; j < i; j++)
{
char randomChar = (char)(r.Next(256, 65536));
testArray[j] = randomChar;
}
string testString = new String(testArray);
string compressed = compressor.Compress(testString);
string decompressed = compressor.Decompress(compressed);
Assert.AreEqual(testString.Length, decompressed.Length);
Assert.AreEqual(testString, decompressed, false, CultureInfo.InvariantCulture);
}
}
compress.Compress and compress.Decompress does some compression and decompression using GZipStream.
It gets through if I try (65, 90) instead of (256, 65536), so I guess it has something to do with unicode. I tried CurrentCulture and no culture at all, not InvariantCulture and it still fails. But the resulting lines appear to be the same:
Assert.AreEqual failed.
Expected:
<☔ ฺ 疉 鎷 얚 跨 꿌 沩 얫 嘹 ֨ ز 항 們 嵜 浮 䑹 شم 靄 斳 薃 픢 萁 ⯬ 쫎 ʛ⫕ 蝺 ꄗ 穌 넢 뇌 䶆 멊 큀퉆 䐫 ̥ 괊 ⑆ 놸 僥 ̅ᵀ 㣚 ꢅ 뺓 䇚 녚 伀 讍 홬 䈕 캾 撏 Ჴ 孢 黮 摠 뮡 䌦 윃 ᬳ 狚 䆙 툾훶 䏤 ꛈṻ⟧㉖ 鮸 蒵 萗 냤 퇅서 㪨 瀲 鰪 残 䓴 ۇ 넃 櫜 㑦 䢻 쮓죣 䕱 䶘 㴝 姳뿝 嘼 ᷨ 㗬 꺬 櫣 涷 ꠶ 浒 껅 က 㷕 䩉 毎 覛 ⧹ 䮯 嬇 힚 艐 Ὑ 쇕횻 鸙 蹻 硐 䈆 쓖 ⸛ 錼 鰙 ኰ 乒 ⺴ 썓힠 䵓 ꅄ ⵈ 桃 怅 㾈 枟 ⏠ه 폫 ا 琖 ퟰ 乼 쩐 鑈 푷 ᫇ 蕱 늛 쭡 䙠 ⲓ ᒇꪮ 툅 ⃑ꦴ 돻 ♹ᢋ 麝 熪 뚭 Ћ 䌚 娯 钮 ⡃ 㪿 ㅞ ⤩ 㥍 車 䎘 磛 蚾 ㅸ 擫 떦 蝳 分 鰽 䠺 ꭍ 튘폻 ⥽ⳉ历 驿 똮 ⯴⋟Ḋ 룴 墭 䐣 앾 郢 ᵸᮄ 杗 奪 騑 硼 佑 烑 鄗 䳘 핬 溴 墽 炁 ࣘ ヲ 栥 풼 ಃ 斗 狹 就 쵎 嬒 瀃 碂 밎 崹 䎐 貇 汫 踖 뢸 숥퍞 르뗿 䭯 䖝 䱅 䵱 꽔븽 䢴 ꁅ⟼ 蒠 癸 ꩽ 靔 临 䚝!⩏ 鍁 Ꮨ䷇ 쁐쨒 ʊ 쪦 鄭 借 滋 铆 ᮉ 嚃 ᩨ ိ 펇 ꮼ 뇄 』ᰉ 㕾 枒 鯅 蛺 䠿 櫄 築 픆 车 똅 ㈆ّ Ἃ 荞 괋 랆 偦 뤰 䝷 핸 ⹝ 屑 素 蝨怀 猔 勛 碉 퀪 睹 Ⓥ 䍙 ಗ 䤮 뾿 谢 ꁼ 戻 ڳ ᆯ콧 偪 ز 븭 碇 쮢 籍 ⁜ 왋 壝 駡 暷 샖 ࣵ 艫 䃴 厫 ᢉ 慨 䁆 ꂴ 溘 欋 옭 螶䦗 跠 﨔 膉 痹 邘 ⋫ 吪 멚 埣 ꯕ 扌 옘 广 犵 肖 街 㶕 畅 몡 ↇ꠫ 襤 픧 ၥ 帻 놤 ਰ 惘 똞 颤 糴 쫼 鿋 䬝 穫 ⺁ 峁 踷 锝 副 鰀 嗊 ⹀ 鰀 嗊 ⹀ 遲 䩢 푑팾 糔 뭯 ࣷ䷴ 䬾 갭 ⶵ 틩 魨 㵻 恬 ҅ པ ᣄⲪ 豩 뛌 ꛵ 㥨 몙 〼 △ ⏮ 큤 亃 ꢡ 웼 ఐ 칇 뻻펂 㢓 吋 䂃 䨠 䕱>.
Actual:
<☔ ฺ 疉 鎷 얚 跨 꿌 沩 얫 嘹 ֨ ز 항 們 嵜 浮 䑹 شم 靄 斳 薃 픢 萁 ⯬ 쫎 ʛ⫕ 蝺 ꄗ 穌 넢 뇌 䶆 멊 큀퉆 䐫 ̥ 괊 ⑆ 놸 僥 ̅ᵀ 㣚 ꢅ 뺓 䇚 녚 伀 讍 홬 䈕 캾 撏 Ჴ 孢 黮 摠 뮡 䌦 윃 ᬳ 狚 䆙 툾훶 䏤 ꛈṻ⟧㉖ 鮸 蒵 萗 냤 퇅서 㪨 瀲 鰪 残 䓴 ۇ 넃 櫜 㑦 䢻 쮓죣 䕱 䶘 㴝 姳뿝 嘼 ᷨ 㗬 꺬 櫣 涷 ꠶ 浒 껅 က 㷕 䩉 毎 覛 ⧹ 䮯 嬇 힚 艐 Ὑ 쇕횻 鸙 蹻 硐 䈆 쓖 ⸛ 錼 鰙 ኰ 乒 ⺴ 썓힠 䵓 ꅄ ⵈ 桃 怅 㾈 枟 ⏠ه 폫 ا 琖 ퟰ 乼 쩐 鑈 푷 ᫇ 蕱 늛 쭡 䙠 ⲓ ᒇꪮ 툅 ⃑ꦴ 돻 ♹ᢋ 麝 熪 뚭 Ћ 䌚 娯 钮 ⡃ 㪿 ㅞ ⤩ 㥍 車 䎘 磛 蚾 ㅸ 擫 떦 蝳 分 鰽 䠺 ꭍ 튘폻 ⥽ⳉ历 驿 똮 ⯴⋟Ḋ 룴 墭 䐣 앾 郢 ᵸᮄ 杗 奪 騑 硼 佑 烑 鄗 䳘 핬 溴 墽 炁 ࣘ ヲ 栥 풼 ಃ 斗 狹 就 쵎 嬒 瀃 碂 밎 崹 䎐 貇 汫 踖 뢸 숥퍞 르뗿 䭯 䖝 䱅 䵱 꽔븽 䢴 ꁅ⟼ 蒠 癸 ꩽ 靔 临 䚝!⩏ 鍁 Ꮨ䷇ 쁐쨒 ʊ 쪦 鄭 借 滋 铆 ᮉ 嚃 ᩨ ိ 펇 ꮼ 뇄 』ᰉ 㕾 枒 鯅 蛺 䠿 櫄 築 픆 车 똅 ㈆ّ Ἃ 荞 괋 랆 偦 뤰 䝷 핸 ⹝ 屑 素 蝨怀 猔 勛 碉 퀪 睹 Ⓥ 䍙 ಗ 䤮 뾿 谢 ꁼ 戻 ڳ ᆯ콧 偪 ز 븭 碇 쮢 籍 ⁜ 왋 壝 駡 暷 샖 ࣵ 艫 䃴 厫 ᢉ 慨 䁆 ꂴ 溘 欋 옭 螶䦗 跠 﨔 膉 痹 邘 ⋫ 吪 멚 埣 ꯕ 扌 옘 广 犵 肖 街 㶕 畅 몡 ↇ꠫ 襤 픧 ၥ 帻 놤 ਰ 惘 똞 颤 糴 쫼 鿋 䬝 穫 ⺁ 峁 踷 锝 副 鰀 嗊 ⹀ 鰀 嗊 ⹀ 遲 䩢 푑팾 糔 뭯 ࣷ䷴ 䬾 갭 ⶵ 틩 魨 㵻 恬 ҅ པ ᣄⲪ 豩 뛌 ꛵ 㥨 몙 〼 △ ⏮ 큤 亃 ꢡ 웼 ఐ 칇 뻻펂 㢓 吋 䂃 䨠 䕱>.
What am I missing?
a source to share
(char)(r.Next(256, 65536))
can create invalid character combinations, resulting in invalid text, so you cannot use it to create test content. This can happen even if the acts are valid and a valid symbol is issued. One example is a surrogate in U + D800 to U + DFFF, but there are probably others.
If you want to generate sample text from all ranges of Unicode, you must be aware of Unicode when you create it, not just arbitrarily by the producer in char
. (I think you came across this when you stated in the question that it worked when you provided a narrower range for a random number.)
a source to share
Use byte
not char
.
Your methods Compress/Decompress
must accept an array byte[]
, and any calls must read your Unicode data and translate it before calling it.
Do you know that .NET 2.0 onwards contains the GZipStream class?
a source to share
I did some experiments:
string testString = new String(testArray);
string anotherString = new String(testArray);
Assert.AreEqual(testString.Length, anotherString.Length);
Assert.AreEqual(testString, anotherString, false, CultureInfo.InvariantCulture);
It is uncompressed. It works great.
I suggest you change your test to this:
for (int i = 256; i < 65536; i++)
{
string testString = new String((char)(i), 2);
string compressed = compressor.Compress(testString);
string decompressed = compressor.Decompress(compressed);
Assert.AreEqual(testString.Length, decompressed.Length);
Assert.AreEqual(testString, decompressed, false, CultureInfo.InvariantCulture);
}
This checks exactly one character at a time, you have no random values (no "sometimes working" problem), and you will see if there are any characters that don't work.
a source to share
I have the same test for encryption / decryption.
Using dichotomy I found that any string containing "surrogate code points" that are Unicode characters in the range U + 55296 to U + 57343 will fail using Assert.AreEqual
therefore, the widest ranges you can use are as follows:
char randomChar = (char)(r.Next(0, 55295));
and
char randomChar = (char)(r.Next(57344, 65535));
a source to share