I'm working on a project where I need to convert text from an encoding (for example Windows-1256 Arabic) to UTF-8.
How do I do this in Go?
You can use the encoding package, which includes support for Windows-1256 via the package golang.org/x/text/encoding/charmap
(in the example below, import this package and use charmap.Windows1256
instead of japanese.ShiftJIS
).
Here's a short example which encodes a japanese UTF-8 string to ShiftJIS encoding and then decodes the ShiftJIS string back to UTF-8. Unfortunately it doesn't work on the playground since the playground doesn't have the "x" packages.
package main
import (
"bytes"
"fmt"
"io/ioutil"
"strings"
"golang.org/x/text/encoding/japanese"
"golang.org/x/text/transform"
)
func main() {
// the string we want to transform
s := "今日は"
fmt.Println(s)
// --- Encoding: convert s from UTF-8 to ShiftJIS
// declare a bytes.Buffer b and an encoder which will write into this buffer
var b bytes.Buffer
wInUTF8 := transform.NewWriter(&b, japanese.ShiftJIS.NewEncoder())
// encode our string
wInUTF8.Write([]byte(s))
wInUTF8.Close()
// print the encoded bytes
fmt.Printf("%#v\n", b)
encS := b.String()
fmt.Println(encS)
// --- Decoding: convert encS from ShiftJIS to UTF8
// declare a decoder which reads from the string we have just encoded
rInUTF8 := transform.NewReader(strings.NewReader(encS), japanese.ShiftJIS.NewDecoder())
// decode our string
decBytes, _ := ioutil.ReadAll(rInUTF8)
decS := string(decBytes)
fmt.Println(decS)
}
There's a more complete example on the Japanese StackOverflow site. The text is Japanese, but the code should be self-explanatory: https://ja.stackoverflow.com/questions/6120
I can't find a live example of converting an encoding to another, doing that in dot net was easy but here I'm really newbie.
Great live example. Hmm, so here we are trying to convert from UTF8 TO Japanese SHIFTJIS, is it possible to do it wise versa?
To decode ShiftJIS, use the second part, starting with "declare a decoder...", encS is the string which you wish to decode, string(decBytes) is the decoded string. Maybe two functions would have been better, but I wanted to keep the example as short as possible...