CSV stands for “Comma-Separated Values”.
It’s a common data format which consist of rows with values separated by commas. It’s used for exporting & importing data.
For example:
You can export your Gmail contacts as a CSV file, and you can also import them using the same format.
This is what a CSV file looks like:
id,name 1,chocolate 2,bacon 3,apple 4,banana 5,almonds
Now you’re going to learn how to use the Ruby CSV library to read & write CSV files.
Ruby CSV Parsing
Ruby comes with a built-in CSV library.
You can read a file directly:
require 'csv' CSV.read("favorite_foods.csv")
Or you can parse a string with CSV data:
require 'csv' CSV.parse("1,chocolate\n2,bacon\n3,apple")
The result?
You get a two-dimensional array where every entry is one row in the table.
It looks like this:
[ ["id", "name"], ["1", "chocolate"], ["2", "bacon"], ["3", "apple"], ["4", "banana"], ["5", "almonds"] ]
You can use array indices like data[1][1]
to work with this data.
But there is a better way!
CSV Options
If your file has headers you can tell the CSV parser to use them.
Like this:
table = CSV.parse(File.read("cats.csv"), headers: true)
Now instead of a multi-dimensional array you get a CSV Table object.
Here’s the description:
“A
CSV::Table
is a two-dimensional data structure for representing CSV documents. Tables allow you to work with the data by row or column, manipulate the data, and even convert the results back to CSV.”
Given one of these tables, you can get the data you need from any row.
Example:
table[0]["id"] # "1" table[0]["name"] # "chocolate"
Here 0
is the first row, id
& name
are the column names.
There are two table modes:
- by_col
- by_row
By changing the table mode (row
by default) you can look at the data from different angles.
For example:
table.by_col[0] # ["1", "2", "3", "4", "5"] table.by_col[1] # ["chocolate", "bacon", "apple", "banana", "almonds"]
Here 0
is the first column, 1
is the second column.
These two methods return a copy of the table.
If you want to make changes to the original table then you can use the by_col!
& by_row!
methods.
This is going to be more memory-efficient because no copy of the table is created.
How to Use CSV Converters
You may have noticed that we got our id
column as an array of strings.
What if we need Integers?
You can get them by calling to_i
on each string…
But there is a shortcut!
The Ruby CSV library implements something called converters.
A converter will automatically transform values for you.
For example:
CSV.parse("1,2,3,4,5") # [["1", "2", "3", "4", "5"]] CSV.parse("1,2,3,4,5", converters: :numeric) # [[1, 2, 3, 4, 5]]
There are 6 built-in converters:
- Integer
- Float
- Numeric (Float + Integer)
- Date
- DateTime
- All
But you can also create your own custom converters.
Here’s how:
CSV::Converters[:symbol] = ->(value) { value.to_sym rescue value }
You can use your new converter like this:
CSV.parse("a,b,c", headers: false, converters: :symbol) # [[:a, :b, :c]]
How to Create a New CSV File
On top of being able to parse & read CSV files in different ways you can also create a CSV from scratch.
This is the easy way:
cats = [ [:blue, 1], [:white, 2], [:black_and_white, 3] ] cats.map { |c| c.join(",") }.join("\n")
You can also use the generate
method:
CSV.generate do |csv| csv << [:blue, 1] csv << [:white, 2] csv << [:black_and_white, 3] end
This prepares the data to be in the right format.
If you want to write to a file you'll have to use something like File.write("cats.csv", data)
, or instead of generate
you can use open
with a file name & write mode enabled.
Like this:
CSV.open("cats.csv", "w") do |csv| csv << [:white, 2] end
Now you have a new CSV file!
CSV Gems & Performance
The built-in library is fine & it will get the job done.
But you can also find a few CSV parsing gems with different features.
For example, the smarter_csv
gem will convert your CSV data into an array of hashes.
Example:
require 'smarter_csv' IntegerConverter = Object.new def IntegerConverter.convert(value) Integer(value) end SmarterCSV.process('testing.csv', value_converters: { id: IntegerConverter }) # [{:id=>1, :name=>"a"}, {:id=>2, :name=>"b"}, {:id=>3, :name=>"c"}]
Here's a performance comparison:
Comparison: CSV: 112.9 i/s Smarter CSV: 21.7 i/s - 5.21x slower Tabular: 17.3 i/s - 6.52x slower
Summary
You've learned how to read & write CSV files in Ruby! You've also learned about converters & alternative Ruby gems to process your CSV data.
If you want to process big CSV files (> 10MB) you may want to use the CSV.foreach(file_name)
method with a block. This will read one row at a time & use a lot less memory.
Please share this article so more people can find it!