Problem parsing string from excel file
I have ruby ββcode to parse data in an excel file using the Parseexcel gem. I need to store 2 columns in this file in a hash, here is my code:
worksheet.each {| row |
if row! = nil
key = row.at (1) .to_s.strip
value = row.at (0) .to_s.strip
if! parts.has_key? (key) and key.length> 0
parts [key] = value
end
end
}
however, it stores the binary keys in a hash: "020098-10". I checked the excel file on the specified line and found the difference: "020098-10" and "020098-10". the first has a leading space and the second does not. I don't understand if it is true that the .strip function already strips all leading and trailing spaces?
also when i tried to print key.length it gave me these weird numbers:
020098-10 length 18 020098-10 length 17
which should be 9 ....
a source to share
If you check the lines you get, you probably get something like:
" \x000\x002\x000\x000\x009\x008\x00-\x001\x000\x00"
This is due to the encoding of the strings. Excel works with unicode while Ruby uses ISO-8859-1 by default. Encodings will be different on different platforms.
You need to convert excel data to printable encoding. However, when you shouldn't encode strings generated in ruby ββas you end up with garbage.
Consider this code:
@enc = Encoding::Converter.new("UTF-16LE", "UTF-8")
def convert(cell)
if cell.numeric
cell.value
else
@enc.convert(cell.value).strip
end
end
parts = {}
worksheet.each do |row|
continue unless row
key = convert row.at(1)
value = convert row.at(0)
parts[key] = value unless parts.has_key?(key) or key.empty?
end
You can change encodings to others.
a source to share
The newer Spreadsheet-gem automatically handles character set conversion, for UTF-8 I think it's standard, but you can change it, so I would recommend using it instead.
a source to share