Notes‎ > ‎

R and the CSV with dodgy characters

posted 10 May 2012, 04:05 by David Sherlock   [ updated 7 Jun 2012, 06:24 ]
I've recently been text mining comments on a blog. One of the problems I came up against was encoding. tm_map threw this at me:

invalid multibyte string 1

The solution is:

mydata.corpus <- tm_map(mydata.corpus, function(x) iconv(enc2utf8(x), sub = "byte"))


Google
Comments