Problem parsing string from excel file

I have ruby ​​code to parse data in an excel file using the Parseexcel gem. I need to store 2 columns in this file in a hash, here is my code:

worksheet.each {| row |
  if row! = nil
    key = row.at (1) .to_s.strip
    value = row.at (0) .to_s.strip

    if! parts.has_key? (key) and key.length> 0
      parts [key] = value
    end
  end
}

however, it stores the binary keys in a hash: "020098-10". I checked the excel file on the specified line and found the difference: "020098-10" and "020098-10". the first has a leading space and the second does not. I don't understand if it is true that the .strip function already strips all leading and trailing spaces?

also when i tried to print key.length it gave me these weird numbers:

020098-10 length 18
020098-10 length 17

which should be 9 ....

+2


a source to share


2 answers


If you check the lines you get, you probably get something like:

" \x000\x002\x000\x000\x009\x008\x00-\x001\x000\x00"

      

This is due to the encoding of the strings. Excel works with unicode while Ruby uses ISO-8859-1 by default. Encodings will be different on different platforms.

You need to convert excel data to printable encoding. However, when you shouldn't encode strings generated in ruby ​​as you end up with garbage.



Consider this code:

@enc = Encoding::Converter.new("UTF-16LE", "UTF-8")

def convert(cell)
  if cell.numeric
    cell.value
  else
    @enc.convert(cell.value).strip
  end
end

parts = {}
worksheet.each do |row|
  continue unless row

  key = convert row.at(1)
  value = convert row.at(0)

  parts[key] = value unless parts.has_key?(key) or key.empty?
end

      

You can change encodings to others.

+1


a source


The newer Spreadsheet-gem automatically handles character set conversion, for UTF-8 I think it's standard, but you can change it, so I would recommend using it instead.



0


a source







All Articles