Our character detection algorithm can potentially incorrectly detect utf-8 as iso-8859-x
if there is a truncated character at the end of the partially read file.
This PR changes the detection algorithm to truncated utf8 characters at the end of the
buffer.
Fix#19743
Signed-off-by: Andrew Thornton <art27@cantab.net>
The io/ioutil package has been deprecated as of Go 1.16, see
https://golang.org/doc/go1.16#ioutil. This commit replaces the existing
io/ioutil functions with their new definitions in io and os packages.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
* Fix chardet test and add ordering option
Signed-off-by: Andrew Thornton <art27@cantab.net>
* minor fixes
Signed-off-by: Andrew Thornton <art27@cantab.net>
* remove log
Signed-off-by: Andrew Thornton <art27@cantab.net>
* remove log2
Signed-off-by: Andrew Thornton <art27@cantab.net>
* only iterate through top results
Signed-off-by: Andrew Thornton <art27@cantab.net>
* Update docs/content/doc/advanced/config-cheat-sheet.en-us.md
* slight restructure of for loop
Signed-off-by: Andrew Thornton <art27@cantab.net>
Co-authored-by: techknowlogick <techknowlogick@gitea.io>
* Convert files to utf-8 for indexing
* Move utf8 functions to modules/base
* Bump repoIndexerLatestVersion to 3
* Add tests for base/encoding.go
* Changes to pass gosimple
* Move UTF8 funcs into new modules/charset package