Matlab - Help using text scan, how to ignore comments and header columns?
I need help using text scan. I am trying to read data that has the
following format:
# ---------------------------------- WARNING
----------------------------------------
# The data you have obtained from this automated U.S. Geological Survey
database
# have not received Director's approval and as such are provisional and
subject to
# revision. The data are released on the condition that neither the USGS
nor the
# United States Government may be held liable for any damages resulting
from its use.
# Additional info: http://nwis.waterdata.usgs.gov/nwis/help/?provisional
#
# File-format description:
http://nwis.waterdata.usgs.gov/nwis/?tab_delimited_format_info
# Automated-retrieval info:
http://nwis.waterdata.usgs.gov/nwis/?automated_retrieval_info
#
# Contact: gs-w_support_nwisweb@usgs.gov
# retrieved: 2013-09-13 13:10:29 EDT (nadww01)
#
# Data for the following 1 site(s) are contained in this file
# USGS 08067074 CWA Canal at Thompson Rd nr Baytown, TX
#
-----------------------------------------------------------------------------------
#
# Data provided for site 08067074
# DD parameter Description
# 01 00010 Temperature, water, degrees Celsius
# 02 00095 Specific conductance, water, unfiltered, microsiemens
per centimeter at 25 degrees Celsius
#
# Data-value qualification codes included in this output:
# A Approved for publication -- Processing and review completed.
# P Provisional data subject to revision.
#
agency_cd site_no datetime tz_cd 01_00010 01_00010_cd 02_00095
02_00095_cd
5s 15s 20d 6s 14n 10s 14n 10s
USGS 08067074 2013-01-05 00:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 00:45 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 01:45 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:00 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:15 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:30 CST 10.3 A 391 A
USGS 08067074 2013-01-05 02:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:15 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 03:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:15 CST 10.2 A 392 A
USGS 08067074 2013-01-05 04:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 04:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:15 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:30 CST 10.2 A 391 A
USGS 08067074 2013-01-05 05:45 CST 10.2 A 391 A
USGS 08067074 2013-01-05 06:00 CST 10.2 A 391 A
USGS 08067074 2013-01-05 06:15 CST 10.1 A 391 A
USGS 08067074 2013-01-05 06:30 CST 10.1 A 391 A
USGS 08067074 2013-01-05 06:45 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:00 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:15 CST 10.1 A 391 A
USGS 08067074 2013-01-05 07:30 CST 10.1 A 390 A
USGS 08067074 2013-01-05 07:45 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:00 CST 10.0 A 390 A
USGS 08067074 2013-01-05 08:15 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:30 CST 10.0 A 391 A
USGS 08067074 2013-01-05 08:45 CST 10.0 A 390 A
USGS 08067074 2013-01-05 09:00 CST 10.0 A 390 A
USGS 08067074 2013-01-05 09:15 CST 10 A 390 A
USGS 08067074 2013-01-05 09:30 CST 10 A 390 A
USGS 08067074 2013-01-05 09:45 CST 10 A 390 A
USGS 08067074 2013-01-05 10:00 CST 10 A 390 A
USGS 08067074 2013-01-05 10:15 CST 10 A 390 A
USGS 08067074 2013-01-05 10:30 CST 10 A 390 A
USGS 08067074 2013-01-05 10:45 CST 10 A 390 A
USGS 08067074 2013-01-05 11:00 CST 10 A 390 A
USGS 08067074 2013-01-05 11:15 CST 10 A 390 A
USGS 08067074 2013-01-05 11:30 CST 10 A 390 A
USGS 08067074 2013-01-05 11:45 CST 10 A 389 A
USGS 08067074 2013-01-05 12:00 CST 10 A 389 A
USGS 08067074 2013-01-05 12:15 CST 10 A 389 A
USGS 08067074 2013-01-05 12:30 CST 10 A 389 A
USGS 08067074 2013-01-05 12:45 CST 10 A 389 A
USGS 08067074 2013-01-05 13:00 CST 10 A 389 A
USGS 08067074 2013-01-05 13:15 CST 10 A 389 A
USGS 08067074 2013-01-05 13:30 CST 10 A 389 A
The only two data entries I am concerned with are "Specific conductance",
and "date". (columns 3 and 7 respectively)
I was able to do this on a consistant basis using the following code:
%%
% Collect conductance data
filename = 'conductivityData_Temp_File';
%%
% Determine length of data file
fid = fopen('conductivityData_Temp_File','r');
fseek(fid, 0, 'eof');
chunksize = ftell(fid);
fseek(fid, 0, 'bof');
ch = fread(fid, chunksize, '*uchar');
N = sum(ch == sprintf('\n')); % number of lines
fclose(fid)
%%
% Read conductivity data
fileconductID = fopen(filename);
waterConductivityData = textscan(fileconductID, '%s %d %s %s %f %s %f %s',
N, 'delimiter', '\t', 'EmptyValue', 0, 'headerlines', 27);
fclose(fileconductID);
However, I found out that you can simply use 'commentstyle' to ignore the
comments. This is important because I am reading multiple files and
occasionally I will encounter a file that does not have exactly 27 comment
rows. That will make my program throw an error.
Can someone tell me how I can adjust my textscan code to ignore the
comment rows and skip the two header rows?
(if you want to downlad an example tab deliminated file to work with use
this link: here
Thank you!
No comments:
Post a Comment