Skip to main content

Organise your Data

Organise your Data

Make a plan from the start of your project to save you time.

What do you think of our new website? Give us your feedback. Dismiss

Planning for this activity from the start of your project can save time and prevent errors. 

Naming files

Develop a file naming convention for your project. Effective file names are consistent, concise, meaningful and findable:

  • be consistent with your file and folder naming
  • number the versions in an appropriate scale. For example, if you are going to collect 15 samples, number the first one 01 rather than 1
  • include a version control table for important documents that includes version number, author, purpose/ changes and date
  • if required, date the documents at the beginning of the file/ folder using YYYY_MM_DD
  • some software can’t read file names with spaces so it may be easier to avoid using them. Alternatives include underscores, hyphen and camel case

Examples

2016_10_31_Eye_Tracking__012 
ActiveDataStorage_05
2016-10-31-Interview-audio-V1

File structure

Accessing files, avoiding duplication and adequate back-up of the files requires a little planning. Much like naming files it is important to be consistent:

  • most operating systems default to a hierarchical file structure: files inside folders, which may be stored in other folders. Aim for a balance between breadth and depth so there isn’t endless clicking to find a folder. It might be easier to start with a limited number of broad topics and create folders within
  • make sure your folders are backed up: this is automatic when using the University filestore
  • name folders after the work rather than individuals
  • review your folders and files to make sure they aren’t kept needlessly, and to separate current and completed activity. An archive folder within the hierarchical structure can be used to move files and folders so there isn’t a cluttered workspace
  • depending on your work, it may be helpful to tag your files and folders to support their discoverability across overlapping folders

File formats

How data are collected and analysed will determine the file formats used during a research project. For long term storage, consider using formats that don't have restrictions on their use. This means they are more likely to be accessible in the future.  The table below shows popular data formats and the file format options.

FormatGreat for preservationOkay for preservation
Textual data

.rtf; .txt; .xml

doc; .docx; .html 
Tabular data .por; .csv; .tab xls; .xlsx; .sav; .mdb; .txt; .dbf; .dta; .ods
Image .tif (verison 6) .jpeg; .jpg; .pdf; .raw; .psd
Video .mj2; .mp4  
Audio .flac; .wav .mp3; aif
Geospatial data .shp; .shx; .dbf; .tiff; .tfw; .mdb; .mif; .kml; .dxf; .svg

Documentation and metadata

Metadata (data about data) and supporting documentation provide the context to research data. Providing the context will allow the data to be easily retrievable and, importantly, understood in the future.

Metadata will be required to describe the data when you deposit in a repository. Supporting documentation outlining the data collection method will be required too. This is often easier to collect during the research project.  

Examples

Metadata provides an overview of the data, location, access conditions and reuse of the data. Differing metadata requirements exist but the following standard information is required:

  • title – how the data are known
  • description – a brief methodological overview of the data. It is similar to an abstract for a paper and can contain information on what the data is, how and why it was collected and how it has been processed
  • keywords – related to the content of the data
  • creators – the main researchers involved in creating the data
  • funders – the source of financial support for the collection of the data
  • access conditions – how the data can be accessed and whether there are any restrictions in place

Supporting documentation describes the data and includes:

  • code, field and label descriptions
  • software used
  • methodology
  • dates of collection
  • geographic location
  • some software will automatically create the metadata during the data collection process
  • subject specific metadata and documentation may exist within disciplines, data repositories and funding agencies and the DCC provide an external overview