Working with text is a lot easier than working with binary data…
…with text you can use regular expressions & methods like
Let me show you some examples, starting with just a plain string & then moving on to more interesting stuff.
This will convert every character in the string into a decimal value:
str = "AABBCC" str.unpack("c*") # [65, 65, 66, 66, 67, 67]
"c*" argument for
This is a “format string” which tells
unpack what to do with the data. In this case,
c means take one character & convert it into an integer value (the String#ord method also does this).
* just says “repeat this format for all the input data”.
This format string takes 4 bytes of data & returns an integer. One thing to notice is that these bytes are in “little-endian” format.
"\xff\x00\x00\x00".unpack("l").first # 255
"\x90\xC0\xDD\x08".unpack("l").first # 148750480
first here because
unpack returns an array.
How do you read a binary file like an EXE, PNG or GZIP?
If you treat these like strings you will just see something that looks like random data…
…but there is a documented structure for most of these file formats & the
unpack method is what you would use to read that data and convert it into something useful.
Here is an example:
binary_data = "\x05\x00\x68\x65\x6c\x6c\x6f" length, message = binary_data.unpack("Sa*") # [5, "hello"]
In this example, the binary data (represented in hexadecimal, which is way more compact than 1s & 0s) has a two-byte (16 bit) length field that contains the length of the following string. Then there is the string itself.
It is very common for binary files & binary network protocols to have a “length” field.
This tells the parser exactly how many bytes should be read (and yes, I know in this example I read both the length & the data in one step, that’s just to keep things simple).
There is also the bindata gem, which is built specifically to help you parse binary structures.
Here is an example:
class BinaryString < BinData::Record endian :little uint16 :len string :name, :read_length => :len end
read_length parameter. This will tell bindata to work out the length from the field, so this will save you a lot of work 🙂
So if you want to write a parser for any binary format, these are the steps:
If you want to see a full example of
bindata in action take a look at my PNG parser on github.
There is this type of encoding called “Base64”. You may have seen it before on a URL.
Looks something like this:
The double equals at the end is usually the tell-tale sign that you are dealing with
Base64, although some inputs can result in the equals signs not being there (they are used as padding).
So why I’m telling you this, besides being a useful thing to know in itself?
Well it turns out that you can convert a string into
Base64 using the
As you can see here:
def encode64(bin) [bin].pack("m") end encode64 "abcd" # "YWJjZA==\n"
In fact, this is the exact method used in the
Base64 module from the standard library 🙂
In this post you learned about the
unpack methods, which help you work with binary data. It can be used to parse binary files, convert a string into ASCII values & Base64 encoding.
Don’t forget to share & subscribe so you can enjoy more blog post like this! 🙂