Golang byte and bytes package
What are Bytes?
In computing, a byte represents a unit of digital information storage that is capable of holding a single character. It consists of 8 bits and can store values ranging from 0 to 255. Bytes are the building blocks of data storage in computers and are essential for representing text, numbers, and various types of information.
Introduction to the Byte in Golang
In Golang programming language, the byte datatype is an alias for uint8, which stands for an unsigned 8-bit integer. This means that:
- Size and Range of byte : A byte in Golang occupies 8 bits of memory and can represent values ranging from 0 to 255.
- Usage: The byte type is extensively used in Golang for handling raw binary data, reading from and writing to files, network communication, and encoding/decoding operations.
Example Usage in Golang:
func main() {
var b byte = 65 // ASCII value for 'A'
fmt.Printf("Value of b: %d\n", b)
fmt.Printf("Character represented by b: %c\n", b)
}
// output
// Value of b: 65
// Character represented by b: A
In this example, the byte b is assigned the ASCII value 65, which corresponds to the character ‘A’. Go’s ability to seamlessly handle bytes allows developers to perform such operations efficiently.
Strings and byte slice
Let us understand the strings in golang briefly.
In Golang, strings are stored as a sequence of bytes that represent UTF-8 encoded characters. Each character can span multiple bytes in memory, depending on its UTF-8 encoding. Understanding how strings are stored internally in Golang provides insight into their characteristics and behaviours
- UTF-8 Encoding:
- In Golang, strings use UTF-8 encoding, so each character in a string is encoded using one or more bytes following the UTF-8 standard.
- UTF-8 is a flexible encoding method where characters within the ASCII range (0-127) are encoded using one byte, but characters outside this range may be encoded using multiple bytes (up to 4 bytes per character).
To grasp strings and byte slices better, consider the example provided below.
Conversion Between Strings and Byte Slice
func main() {
// Define a string containing ASCII and Unicode characters
characterString := "Hello 🙏"
// Convert the string to a byte slice
byteSlice := []byte{72, 101, 108, 108, 111, 32, 240, 159, 153, 143}
// Print the original string
fmt.Println("Original String:", characterString)
// Print the byte slice
fmt.Println("Byte Slice:", byteSlice)
// Convert the byte slice back to a string and print it
convertedString := string(byteSlice)
fmt.Println("Converted String from Byte Slice:", convertedString)
}
Output :
Original String: Hello 🙏
Byte Slice: [72 101 108 108 111 32 240 159 153 143]
Converted String from Byte Slice: Hello 🙏
This example illustrates how to work with strings and byte slices in GoLang. By converting a string to a byte slice, you can see its raw byte representation. Conversely, converting a byte slice back to a string reconstructs the original string, demonstrating the relationship between these two data types in Go. This is particularly useful when handling text data that includes both ASCII and Unicode characters.
Encoding in bytes and strings
Bytes:
- No Explicit Character Encoding: Bytes in Go do not carry any encoding information. They are raw data, which means their meaning is defined by the context in which they are used. This makes bytes versatile for low-level data manipulation, such as file I/O or network communication.
- Example:
The byte slice []byte{0x48, 0x65, 0x6C, 0x6C, 0x6F} represents the ASCII characters for “Hello”. Each byte corresponds to an ASCII value: 0x48 (H), 0x65 (e), 0x6C (l), 0x6C (l), 0x6F (o).
Strings:
- UTF-8 Encoding: Go strings are inherently UTF-8 encoded, which means they can represent characters from virtually any language. UTF-8 encoding is efficient and backward-compatible with ASCII, making it suitable for a wide range of applications.
- Example:
The string unicodeString := “こんにちは” contains Japanese characters, which are encoded in UTF-8. When converted to a byte slice, each character is represented by a sequence of bytes.
Rune Data Type
In Golang, the rune data type is used to represent Unicode characters. A rune is an alias for int32, and it represents a single Unicode code point. This allows Go to support a wide range of languages and characters in strings.
String Example:
str := "Hello, 🙏"
fmt.Println(str)
Byte Slice Example:
byteSlice := []byte{72, 101, 108, 108, 111}
fmt.Println(byteSlice)
byteSlice[0] = 79 // Modify the first byte from 'H' to 'O' (ASCII code)
fmt.Println(byteSlice)
Run Slice Example:
func main() {
str := "Hello, 🙏"
runeSlice := []rune(str)
runeSlice[0] = 'O' // Modify the first rune from 'H' to 'O'
fmt.Println(string(runeSlice))
}
In the examples above:
- The String Example demonstrates printing a string that includes the 🙏 emoji directly.
- The Byte Slice Example initializes a byte slice with ASCII values corresponding to “Hello” and modifies the first byte.
- The Rune Slice Example initializes a rune slice from a string containing the 🙏 emoji and modifies the first rune.
Difference between string, byte slice and rune slice
Aspect
String
Byte Slice ([]byte)
Rune Slice ([]rune)
Alias Names
str, text
bytes, byteArr
runes
Representation
Characters encoded in UTF-8
Direct array of bytes
Unicode code points (int32)
Length
Number of UTF-8 encoded bytes
Number of bytes
Number of runes
Access
Accessible by index
Accessible by index
Accessible by index
Mutable
No
Yes
Yes
Loop syntax
for index, characterValue := range string{
//code logic
}
Note: Inconsistent index mapping corresponding to characterValue
for index, byteValue := range byteSlice{
//code logic
}
Note: Consistent index mapping corresponding to byteValue
for index, runeValue := range runeSlice{
//code logic
}
Note: Consistent index mapping corresponding to runeValue
For a detailed understanding of strings in Golang, please visit our blog post titled Golang string.
Golang bytes package
In Golang, the bytes package provides functions and types for working with byte slice ([]byte). It offers efficient operations for manipulating byte data, buffering I/O operations, and working with strings as byte slices. Let us understand the frequently used types and function as below :
1. Contains(b, subslice []byte) bool
- Function: Reports whether subslice is within b.
- Parameters:
- b: The byte slice to search within.
- subslice: The byte slice to search for within b.
- Returns: true if subslice is found within b, otherwise false.
Example:
func main() {
// Define the byte slice and the sub-slice to find
text := []byte("Example") // Using string literal to create byte slice
subText := []byte("ample") // Using string literal to create byte slice for the sub-slice
// Check if the byte slice contains the sub-slice
contains := bytes.Contains(text, subText)
if contains {
fmt.Println("The byte slice contains the sub-slice.")
} else {
fmt.Println("The byte slice does not contain the sub-slice.")
}
}
For the complete program please visit our GitHub repository
In the above example, bytes.Contains checks if the byte slice text contains the substring subtext. Since “ample” is indeed a part of “example”, it returns true.
2. Count(s, sep []byte) int
- Function: Counts the number of non-overlapping instances of sep in s.
- Parameters:
- s: The byte slice to search within.
- sep: The byte slice to count occurrences of within s.
- Returns: The number of times sep occurs in s.
Example:
func main() {
data := []byte("example")
fmt.Println(bytes.Count(data, []byte("e"))) // Output: 2
}
For the complete program, Please visit our GitHub repository.
In the above example, bytes.Count counts the occurrences of the byte e in the byte slice data. Since “e” appears twice in “example”, it returns 2.
3. Equal(a, b []byte) bool
- Function: Reports whether a and b are the same length and contain the same bytes.
- Parameters:
- a, b: The byte slices to compare.
- Returns: true if a and b are identical in content and length, otherwise false.
Example:
func main() {
byteSlice1 := []byte("Golang")
byteSlice2 := []byte("Golang")
fmt.Println(bytes.Equal(byteSlice1, byteSlice2)) // Output: true
}
For the complete program, Please visit our GitHub repository.
bytes.Equal compares the byte slices byteSlice1 and byteSlice2. Since both contain the same bytes (“Golang” and “Golang”) in the same order and have the same length, it returns true.
4. bytes.Index
- Function: Index(s, sep []byte) int
- Parameters:
- s: The byte slice to search within.
- sep: The byte slice to search for within s.
- Returns: The index of the first occurrence of sep in s, or -1 if sep is not found.
Example:
func main() {
data := []byte("example")
fmt.Println(bytes.Index(data, []byte("amp"))) // Output: 2
}
For the complete program, Please visit our GitHub repository.
The above program uses bytes.Index to find the index of “amp” in the byte slice data, returning 2 since “amp” starts at that index in “example”. If “amp” is not found, it returns -1.
5. Split(s, sep []byte) [][]byte
- Function: Splits s into slices separated by sep and returns a slice of the substrings.
- Parameters:
- s: The byte slice to split.
- sep: The byte slice that specifies the delimiter.
- Returns: A slice of byte slices ([][]byte) split from s based on occurrences of sep.
Example:
func main() {
data := []byte("apple,banana,orange")
parts := bytes.Split(data, []byte(","))
fmt.Println(parts)
// Output: [[97 112 112 108 101] [98 97 110 97 110 97] [111 114 97 110 103 101]]
}
For the complete program, Please visit our GitHub repository.
bytes.Split divides the byte slice data into smaller slices based on the delimiter “,”. It returns a slice (parts) containing three byte slices: [“apple” “banana” “orange”]
6. Type Buffer (bytes.Buffer)
bytes.Buffer is a dynamic buffer that allows manipulation of byte data in memory. It implements the io.Reader, io.Writer, io.ReaderFrom, io.WriterTo, io.ByteScanner, and io.ByteWriter interfaces, making it highly versatile for handling byte data.
package main
import (
"bytes"
"fmt"
)
func main() {
// Creating a new buffer
var buf bytes.Buffer
// Writing data to the buffer
buf.WriteString("Hello, ")
buf.WriteByte('W')
buf.Write([]byte("orld!"))
// Reading data from the buffer using io.Reader interface
data := make([]byte, buf.Len())
_, err := buf.Read(data)
if err != nil {
fmt.Println("Error reading data from buffer:", err)
return
}
fmt.Println("Data read from buffer:", string(data))
// Using io.Writer interface to write data to the buffer
_, err = fmt.Fprintf(&buf, " How are you?")
if err != nil {
fmt.Println("Error writing data to buffer:", err)
return
}
// Outputting the entire content of the buffer
fmt.Println("Buffer contents:", buf.String())
}
For the complete program, Please visit our GitHub repository.
Explaination:
- Import Statements:
- The code imports necessary packages bytes and fmt.
- Creating a Buffer:
- var buf bytes.Buffer creates a new bytes.Buffer named buf. This buffer will be used to store byte data.
- Writing Data to the Buffer:
- buf.WriteString(“Hello, “) appends the string “Hello, “ to the buffer.
- buf.WriteByte(‘W’) appends the byte ‘W’ to the buffer.
- buf.Write([]byte(“orld!”)) appends the byte slice []byte(“orld!”) to the buffer.
- After these operations, buf contains “Hello, World!”.
- Reading Data from the Buffer:
- data := make([]byte, buf.Len()) creates a byte slice data with a length equal to the current length of buf.
- buf.Read(data) reads the entire content of buf into the data slice. After this read operation, buf becomes empty because the read pointer moves to the end of the buffer.
- Outputting Read Data:
- fmt.Println(“Data read from buffer:”, string(data)) prints the data read from the buffer, which should be “Hello, World!”.
- Writing Additional Data to the Buffer:
- fmt.Fprintf(&buf, ” How are you?”) appends the string ” How are you?” to the buffer using fmt.Fprintf. This demonstrates using bytes.Buffer with the io.Writer interface.
- Outputting Entire Buffer Content:
- fmt.Println(“Buffer contents:”, buf.String()) prints the entire current content of the buffer (“Hello, World! How are you?”).
The bytes.Buffer type in Golang provides versatile methods for working with byte data in memory. It supports efficient appending, reading, and manipulation of byte sequences, making it ideal for scenarios where dynamic construction or modification of byte data is needed, such as Building Strings Efficiently, Reading and Writing Files, Serialization and Deserialization, String Manipulation, Buffering Output , Efficient Byte Operations, Inter-process Communication.
7. Type Reader (bytes.NewReader)
The bytes.NewReader function in Go creates and returns a new Reader type initialized with a given byte slice b. This Reader implements the io.Reader, io.Seeker, and io.ReaderAt interfaces, allowing efficient reading and seeking operations on the provided byte slice.
func main() {
// Example data
data := []byte("Hello, World!")
// Create a bytes.Reader with initial data
reader := bytes.NewReader(data)
// Reading data from the reader using io.Reader interface
buffer := make([]byte, len(data))
_, err := reader.Read(buffer)
if err != nil && err != io.EOF {
fmt.Println("Error reading data:", err)
return
}
fmt.Println("Data read from bytes.Reader:", string(buffer))
// Seek back to the beginning of the reader
reader.Seek(0, io.SeekStart)
// Reading a single byte
b, err := reader.ReadByte()
if err != nil {
fmt.Println("Error reading byte:", err)
return
}
fmt.Println("Single byte read from bytes.Reader:", string(b))
// Reading a single rune
r, size, err := reader.ReadRune()
if err != nil {
fmt.Println("Error reading rune:", err)
return
}
fmt.Printf("Single rune read from bytes.Reader: %c (size: %d)\n", r, size)
// Reset the reader to start again
reader.Reset(data)
// Copying data from reader to writer
var writer bytes.Buffer
n, err := io.Copy(&writer, reader)
if err != nil {
fmt.Println("Error copying data:", err)
return
}
fmt.Printf("Copied %d bytes from bytes.Reader to buffer: %s\n", n, writer.String())
}
For the complete program, Please visit our GitHub repository.
Output:
Data read from bytes.Reader: Hello, World!
Single byte read from bytes.Reader: H
Single rune read from bytes.Reader: e (size: 1)
Copied 13 bytes from bytes.Reader to buffer: Hello, World!
Explanation:
- Initialization:
- data := []byte(“Hello, World!”): Defines a byte slice containing the string “Hello, World!”.
- reader := bytes.NewReader(data): Creates a bytes.Reader initialized with data.
- Reading Data:
- buffer := make([]byte, len(data)): Creates a byte slice buffer with the same length as data.
- reader.Read(buffer): Reads the entire content of data into buffer.
- Example: buffer becomes [‘H’, ‘e’, ‘l’, ‘l’, ‘o’, ‘,’, ‘ ‘, ‘W’, ‘o’, ‘r’, ‘l’, ‘d’, ‘!’].
- reader.ReadByte(): Reads and returns the first byte from data, here ‘H’.
- reader.ReadRune(): Reads and returns the first UTF-8 encoded Unicode character (rune) from data, here ‘H’.
- Seeking and Resetting:
- reader.Seek(0, io.SeekStart): Sets the reader’s position back to the start of data.
- reader.Reset(data): Resets the reader to read from the beginning of data.
- Copying Data:
- var writer bytes.Buffer: Initializes a bytes.Buffer to store copied data.
- io.Copy(&writer, reader): Copies all remaining data from reader to writer.
- Example: writer.String() would output “Hello, World!”.
Use Cases:
- Efficient Reading: Ideal for sequentially processing byte data from an in-memory source.
- Byte and Rune Handling: Provides methods to read individual bytes and UTF-8 runes.
- Resetting and Seeking: Supports resetting and seeking within the data for reprocessing.
- Copying Data: Facilitates efficient copying of byte data between readers and writers.
The above example showcases how bytes.NewReader in Golang can efficiently handle and manipulate byte data, making it valuable for tasks such as parsing, processing byte-based protocols, and implementing efficient data readers in Go programs.
8.Performance Comparison of bytes.NewReader and strings.NewReader in Reading 1 GB Data
The provided program compares the performance of reading 1 GB of data using two different readers in Go: bytes.NewReader and strings.NewReader. This comparison helps understand the efficiency and behavior of each reader when handling large amounts of data.
package main
import (
"bytes"
"fmt"
"io"
"strings"
"time"
)
func main() {
const GB = 1024 * 1024 * 1024
data := make([]byte, GB)
for i := 0; i < GB; i++ {
data[i] = byte(i % 256) // Filling data with bytes from 0 to 255
}
// Measure time to read from bytes.NewReader
startTimeBytes := time.Now()
byteReader := bytes.NewReader(data)
byteBuffer := make([]byte, 1024)
totalBytes := 0
for {
n, err := byteReader.Read(byteBuffer)
totalBytes += n
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error reading with bytes.NewReader:", err)
return
}
}
elapsedBytes := time.Since(startTimeBytes)
fmt.Printf("Read %d bytes with bytes.NewReader in %v\n", totalBytes, elapsedBytes)
// Measure time to read from strings.NewReader
startTimeString := time.Now()
stringReader := strings.NewReader(string(data))
stringBuffer := make([]byte, 1024)
totalBytes = 0
for {
n, err := stringReader.Read(stringBuffer)
totalBytes += n
if err == io.EOF {
break
} else if err != nil {
fmt.Println("Error reading with strings.NewReader:", err)
return
}
}
elapsedString := time.Since(startTimeString)
fmt.Printf("Read %d bytes with strings.NewReader in %v\n", totalBytes, elapsedString)
}
For the complete program, Please visit our GitHub repository.
Output:
Read 1073741824 bytes with bytes.NewReader in 77.334384ms
Read 1073741824 bytes with strings.NewReader in 768.866437ms
// note : output varies as per machine configuration
Constants and Data Initialization
- First, we define a constant GB representing 1 gigabyte in bytes. We then create a byte slice data of size 1 GB and fill it with values from 0 to 255 in a repeating pattern. This step sets up a large block of data for our read performance comparison.
Reading with bytes.NewReader
- Next, we measure how long it takes to read the entire byte slice using bytes.NewReader. We capture the start time, create a bytes.Reader from our data, and read the data in 1024-byte chunks. We track the total bytes read and handle any read errors. Finally, we calculate the elapsed time and print the results.
Reading with strings.NewReader
- We then perform a similar measurement using strings.NewReader. We start by converting the byte slice to a string (which involves additional memory allocation). Using a strings.Reader, we read the string data in 1024-byte chunks, tracking the total bytes read and handling errors. Again, we calculate the elapsed time and print the results.
This program compares the performance of reading 1 GB of data using bytes.NewReader and strings.NewReader. The key differences are:
- bytes.NewReader directly reads from a byte slice.
- strings.NewReader reads from a string, which involves converting the byte slice to a string (a more memory-intensive operation).
The program measures and prints the time taken by each reader to read the entire data, helping to understand the performance characteristics of each method.
bytes.NewReader is ideal for working with binary data. If you’re dealing with raw bytes, such as image files, network packets, or any non-textual data, bytes.NewReader provides a direct way to read and manipulate this data. If your data is textual, using strings.NewReader might be more appropriate. It provides an easy way to read and process strings, such as reading lines of text or working with string-based data formats like JSON or XML.
Reference link
To learn more about the bytes package in Golang, visit the official Golang documentation golang bytes package