# Importing and Exporting Data

My typical advice is that, if you're doing your analysis in Matlab, you can focus on importing data and forget about exporting it. But in reality, it's helpful to know how to do both, and the commands are quite similar.

Unfortunately, Matlab is not nearly as versatile when it comes to data import/export as a language like R, which was designed for statisticians who deal with data in lots of forms. In this lesson, we'll focus on a simple example, and I'll point you to more complicated functions at the end.

## Contents

## Pick Your Poison

This is going to be a complicated discussion for a few reasons:

- The most common data format out in the world is Microsoft's .xls spreadsheet format. Unfortunately, this is a proprietary format, which means that, in order to open .xls files in Matlab, you will need a copy of Excel, and you will need to be running Windows. Sorry, but them's the breaks.
- Even if you are using Microsoft's semi-open .xlsx format, which is supposed to open on Macs and Unix/Linux under conditions below, you may run into problems. Your data need to be very cleanly formatted.
- Fine, you say, Excel will convert my data to csv (comma-separated value), which is nice and portable and easy to open up in a text editor to read. And yes, Matlab will read csv. But only numbers. Too bad if you have a column of strings. This will be okay with Excel files, but the outcome will be a little funky.

Sorry.

Thus, in general, you will need to do one of the following:

- Recode everything in your spreadsheet as numbers, save as csv, and import.
- Use a Windows machine with Excel installed to import your data.
- Succeed in getting Matlab to load xls or xlsx files in "basic" mode with
`xlsread`. (I had to clean the data a lot to get this working.)

As a result, I'm going to list several options below. You'll have to play around to figure out which is best for you.

## Getting Started

First you'll need to download the data (xls xlsx csv). These are data extracted from the Multnomah County Jury Project, which tabulated jury decisions in Multnomah County, Oregon, in the period 1973-1976. The data are currently stored in a comma-separated value file (csv), which is a pretty typical and portable data format. (Excel will read and save files in this format, and it's actually human-readable, since each line is just the entries in each row, separated by a delimeter: commas, spaces, tabs, etc.)

Now move to the directory where the file is located. Those of you with a spreadsheet program installed may want to take a look at the data.

## Pulling data

In its simplest form, we want to use one of the following commands to pull data

[num,txt,raw] = xlsread('jury_data.xlsx'); num = csvread('jury_data.csv',1,0);

Comments:

`xlsread`returns three arguments, the numerical data,`num`(in an array), the text data,`txt`(in a cell array the same size as the spreadsheet, with all numerical data removed), and the "raw" data in a cell array (`raw`), where each cell array is a cell in the spreadsheet. In`raw`, depending on whether or not you have stray spaces, Matlab may convert your numbers to strings inside the cell array.`csvread`always returns a numerical matrix. The two extra arguments supplied are the row and column at which to start. Both are indexed from 0, so that the arguments`1`and`0`above mean to start at the second row, first column.

If we used the `xlsread` option, we can do some data cleaning (advanced):

dat=nan(size(raw)); %make an empty cell array strtype = cellfun(@ischar,raw); %apply the ischar function to every element of raw dat(~strtype) = cell2mat(raw(~strtype)); %copy over cells that are already numbers dat(strtype) = str2double(raw(strtype)); %convert non-numeric cells to numbers, then copy

Why do we solve the problem like this? Or better yet: what should our thought process be when we approach such a problem? Some thoughts:

- This is not simple. It took me several minutes to work out how it had to be done.
- This would be really painful to do with a loop, but if you
*know*how to do it with a loop, and that's the*only*way you know how to do it, then use a loop! Go for a solution that works before you waste to much time on being elegant. - Our basic plan is to copy all the data over to a new array in two pieces. First, we copy over everything that's already a number, then we convert the rest to numbers and copy them the same way.
- To do this, we need to know which cells contain numbers. As you will find, functions like
`ischar`that test whether a single variable is a string, don't work so well on cell array.`cellfun`is a function that applies a function to every element of a cell array. To do that, we need to supply the name of a function to use on each cell. For reasons we'll cover in the intermediate class, you need to do that using`@`. The return from`cellfun`is then a matrix the same size as the cell array, with a 1 if the corresponding cell contains a string and 0 otherwise. - Next, we want to copy all the numbers. We use logical indexing, with
`~strtype`as our logical (it's 0 for strings and 1 for numbers). On the right-hand side, we need to get cells from`raw`using`~strtype`, but that returns*cells*. We can convert those to an array using the`cell2mat`function. - Finally, we need to convert the string cells to number and copy them over. Here, we do have a nice, vectorized function,
`str2double`that converts all our string-valued cells to double-precision numbers (this won't be quite right for your integers, since it will add decimal points, but you can fix that later). Again, we use logical indexing with`strtype`to select the string values in raw, convert them to numbers, and assign these to the correct subset of`dat`.

Two things to take away from this:

- There is a lot of Matlab left to learn, and unfortunately, it's sometimes needed for very basic things. But there aren't that many new concepts, just more functions to learn.
- Matlab can be really powerful. It only took four lines to solve a data cleaning problem that would be absurd to do by hand-editing.

## Writing Data to Spreadsheet

Thankfully, writing data out is basically the reverse process of reading data in. For more information, see the help files listed below. The process should be self-explanatory, with the caveat that you need to be aware of the limits of what type of data can be written out using each command.

## Loading and Saving Matlab Data

This one's easy. To load data simply use the `load` command:

load practice.mat % you have to be in the directory containing practice.mat load('practice.mat') %also works

Now, if you're inside a function, `load` won't put variables directly into the workspace, since it might accidentally overwrite something existing. In that case (or anytime), you can get an output from `load`

myvars = load('practice.mat'); %returns a structure; each field is a variable

As for saving data, you do that with `save`, which takes as its argument an output file name and list of variables:

a=2; b=1; save myoutput.mat a b %save variables a and b to myoutput.mat save('myoutput.mat','a','b') %does the same thing