How to Read a File in Chunks in Golang

To read a file in chunks in Golang, you can use the bufio.NewReader()”, “file.Read(buffer)”, or “use the goroutine to read the file concurrently”.

Method 1: Using the “bufio.NewReader()” function

The bufio.NewReader() function in Go is used to create a new bufio.Reader object. A bufio.Reader object is a buffered reader that can be used to read data from an underlying io.Reader object.

Let’s read a file in chunks using the “bufio.NewReader()” function.

Example

package main

import (
  "bufio"
  "fmt"
  "log"
  "os"
)

func main() {
  // Open the file for reading
  file, err := os.Open("data.txt")
  if err != nil {
    log.Fatal(err)
  }
  defer file.Close()

  // Create a buffered reader
  reader := bufio.NewReader(file)

  // Read the file in 4-byte chunks
  chunkSize := 4
  
  for {
    // Read the next chunk
    chunk := make([]byte, chunkSize)
    n, err := reader.Read(chunk)
    
    if err != nil {
      if err.Error() == “EOF” {
        break
      }
      log.Fatal(err)
    }

    // Print the chunk
    fmt.Printf("%s", chunk[:n])

   // Check for EOF
    if err == nil && n < chunkSize {
      break
    }
  }
}

Output

This is a text file
We will read an entire file
Hello World

In this example, we opened the file for reading using the “os.Open()” function and deferred its closure. Then, we created a buffered reader using “bufio.NewReader()” and specified the size of the chunk we wanted to read.

Next, we entered a loop where we read the file in chunks using the “reader.Read()” function.

The “Read()” function returns the number of bytes read and an error. If the error is not “nil” and not equal to EOF, we use the “log.Fatal()” function to print the error and exit the program.

In the next step, we printed the chunk using the “fmt.Printf()” function and checked if we had reached the end of the file by checking if err is nil and n is less than chunkSize. If we reach the end of the file, we break out of the loop.

Method 2: Using the “file.Read(buffer)” function

 We will use the buffer size of 100 bytes to read a file in chunks using the “file.Read()” function.

Example

package main

import (
  "fmt"
  "io"
  "os"
)

func main() {
  const BufferSize = 100
  file, err := os.Open("data.txt")
  if err != nil {
    fmt.Println(err)
    return
  }
  defer file.Close()

  buffer := make([]byte, BufferSize)

  for {
    bytesread, err := file.Read(buffer)

    if err != nil {
      if err != io.EOF {
        fmt.Println(err)
      }
      break
    }

   fmt.Println("bytes read: ", bytesread)
   fmt.Println("bytestream to string: ", string(buffer[:bytesread]))
 }
}

Output

bytes read: 100
bytestream to string: unblockiacom, KPAJITWYS9, direct
googlecom, pub-2128757167812663, reseller, f08c47fec0942fa0
rubic
bytes read: 100
bytestream to string: onprojectcom, 17210, reseller, 0bfd66d529a55807
appnexuscom, 10264, direct
indexexchangecom, 1924

Here’s a brief explanation of what each part does:

  1. os.Open(“filetoread.txt”): This opens the file named “filetoread.txt” for reading. If the file doesn’t exist or there’s an error opening the file, the os.Open() function returns an error.
  2. defer file.Close(): This ensures the file will be closed once the function that opened it returns, regardless of where the return statement is called in the function.
  3. make([]byte, BufferSize): This creates a slice of bytes with a length and capacity equal to BufferSize (100 bytes). This slice is used as the buffer to read data from the file.
  4. file.Read(buffer): This reads up to BufferSize bytes from the file into the buffer. It returns the number of bytes read and an error value. If the error is io.EOF means we’ve reached the end of the file, and there’s nothing more to read.
  5. string(buffer[:bytesread]): This converts the bytes read from the file into a string so they can be printed out.

Method 3: Use a “goroutine” to read a file concurrently

We will read a file in chunks in parallel using “multiple goroutines” in Go.

Example

package main

import (
  "fmt"
  "io"
  "os"
  "sync"
)

const BufferSize = 100

type chunk struct {
  bufsize int
  offset int64
}

func main() {
  file, err := os.Open("data.txt")
  if err != nil {
    fmt.Println(err)
    return
  }
  defer file.Close()

  fileinfo, err := file.Stat()
  if err != nil {
    fmt.Println(err)
    return
  }

  filesize := int(fileinfo.Size())
  concurrency := filesize / BufferSize
  if remainder := filesize % BufferSize; remainder != 0 {
    concurrency++
  }

  chunksizes := make([]chunk, concurrency)

  // calculate each chunk size
  for i := 0; i < concurrency; i++ {
    chunksizes[i].offset = int64(i * BufferSize)
    chunksizes[i].bufsize = BufferSize
    if i == concurrency-1 {
      chunksizes[i].bufsize = filesize % BufferSize
    }
  }

  var wg sync.WaitGroup
  wg.Add(concurrency)

  for i := 0; i < concurrency; i++ {
    go func(chunksizes []chunk, i int) {
      defer wg.Done()

      chunk := chunksizes[i]
      buffer := make([]byte, chunk.bufsize)
      bytesread, err := file.ReadAt(buffer, chunk.offset)

      if err != nil && err != io.EOF {
        fmt.Println(err)
        return
      }

      fmt.Println("bytes read, string(bytestream): ", bytesread)
      fmt.Println("bytestream to string: ", string(buffer[:bytesread]))
    }(chunksizes, i)
  }

  wg.Wait()
}

Output

bytes read, string(bytestream): 100
bytes read, string(bytestream): 100
bytes read, string(bytestream): 100
bytes read, string(bytestream): 100
bytestream to string: , RESELLER, 5d62403b186f2ace
rubiconprojectcom, 22330, RESELLER, 0bfd66d529a55807
rubiconprojectco
bytes read, string(bytestream): 100
bytes read, string(bytestream): 100
bytes read, string(bytestream): 100
bytes read, string(bytestream): 100
bytes read, string(bytestream): 100
bytestream to string: 115, RESELLER, fafdf38b16bf6b2b
sovrncom, 257429, RESELLER, fafdf38b16bf6b2b
sovrncom, 249425, RES
bytestream to string: 9888, RESELLER, 6a698e2ec38604c6
appnexuscom, 3703, RESELLER, f5ab79cb980f11d1
loopmecom, 5679, RE

The main idea is to divide the file into chunks of BufferSize bytes (except for possibly the last chunk, which may be smaller), then read each chunk in a separate goroutine.

The sync.WaitGroup() waits for all goroutines to finish before the program exits.

This code includes calculating each chunk size and creating the chunk type to store the chunk size and offset.

2 thoughts on “How to Read a File in Chunks in Golang”

  1. Hey! If the size of the last chunk is “complete” then your program gets stuck in loop.

    I would suggest to change the first condition to check for EOF, like this:
    if err != nil {
    if err.Error() == “EOF” {
    break
    }
    log.Fatal(err)
    }

    Reply

Leave a Comment