File I/O Commands

Core Tcl supports file open, close, puts, gets, and read. TclX, as usual, adds a collection of convenience features and C-like functions to make file handling easier and more sophisticated.

I use for_file a lot. This proc does something simple and useful:

tcl>for_file line /etc/hosts {echo [lindex $line 0]}

reads the file /etc/hosts one line at a time, putting each line into the variable line and then executing the contents of the third (code) argument for each line. for_file is simpler and easier than using open, gets, etc. Obviously you can use for_file as an alternative to awk and grep, but TclX also offers other, more powerful, file scanning commands (to be described later).

You don't always want to read files one line at a time, so

	read_file ?-nonewline? fileName
	read_file fileName numBytes

opens the named file for you, reads its contents, and returns them as a single string.

	write_file fileName string ?string. ...?

likewise opens the named file for you and writes the specified string(s) to it in one command.

tcl>cat .logout
stty rows 24
stty cols 80
clear
tcl>set res [read_file .logout]
tcl>echo $res
stty rows 24
stty cols 80
clear

lgets is a special variant of the core Tcl gets command.

	lgets fileID ?varName?

reads data from a file as a series of Tcl lists. lgets reads the next syntactically correct Tcl list (by parsing matching braces and newlines) from the open file, rather than simply the next line.

If a newline is present within a brace-delimited list, it does not terminate the input; only a newline outside curly braces terminates a single read operation. If varName is specified, the newly-read list is put into varName and the return value is a count of characters read; if no varName is specified, then the return value is the list. Let's say I create a file test.list in which I try to ring all the changes on braces and newlines:

{This is the first line of the file}
Now we will type a carriage
return but no braces
{and then a list with braces and a
carriage return}
and {this is another list} with a list in it, no return
{This is a list {with a {yow a subsublist} sublist} in it, no return}
{Another nested list: {This is a nested list with a 
carriage return} in it {and another sublist} ...OK, enough}
This is the last line

Now I read this file with lgets:

tcl>set fd [open test.list r]
tcl>lgets $fd listvar
36
tcl>echo $listvar
{This is the first line of the file}
tcl>lgets $fd
Now we will type a carriage
tcl>lgets $fd
return but no braces
tcl>lgets $fd
{and then a list with braces and a
carriage return}
tcl>lgets $fd
and {this is another list} with a list in it, no return
tcl>lgets $fd
{This is a list {with a {yow a subsublist} sublist} in it}
tcl>lgets $fd
{Another nested list: {This is a nested list with a 
carriage return} in it {and another sublist} ...OK, enough}
tcl>lgets $fd listvar
19
tcl>echo $listvar
This is the last line
tcl>lgets $fd listvar
-1
tcl>

This is quite powerful; a list can be a highly structured object. As we saw in the discussion of keyed lists, a structured list makes a good package in which to pass a data set to another process (for example), with minimal parsing effort at the receiving end. (See the pipe command later in this section, which opens file IDs for interprocess communications.) A complex list stored to a file could be a checkpoint dump for an application. Some Tcl code itself can be parsed as syntactically valid lists (beware, though, since any exceptional use of braces, such as in a comment line, could break the list syntax).

To change the name of a file,

	frename oldPath newPath

uses the rename system call, and spares you an exec mv oldPath newPath. ftruncate truncates a file to a maximum size:

	 ftruncate fileName newSize

and on some systems can be used against open file IDs:

	ftruncate -fileid fileID newSize

The core Tcl commands open, close, puts, and gets permit you to to use file IDs for simple file I/O. TclX can do more with file IDs.

The copyfile command allows you to copy from one open file ID, starting at the current position and continuing for a specified maximum number of bytes, to another open file ID starting at its current position. The number of bytes copied can be specified with the -bytes flag, in which case an error is returned if fewer than the specified number of bytes were readable from the input file. If the -maxbytes flag is specified, then no error is returned if the input stream runs out before the desired byte count is achieved. The syntax is

	copyfile -bytes byteCount inputFileID outputFileID

You could use this command to intersperse binary data from some input file into an output stream of mostly ASCII strings (Tcl, in which "everything is a string," can't really handle binary data).

bsearch uses read and seek functions to perform a binary search on an open file ID, similar to the search performed on an array by the C library function bsearch.

	bsearch fileID keyString ?returnVar? ?compareProc?

searches the opened file pointed to by fileID. This file must contain lines of text sorted into ascending order with regard to the criteria for the search. In other words, if you're expecting to find a certain string in the second field of the target record, the contents of the file need to be pre-sorted by the second field of all records. bsearch tries to find a match to keyString. By default it returns either the text of the line where a match was found, or a null string if no match was found ; but if returnVar is specified, then the return value is 1 or 0 for match success/failure, and the found line is returned in returnVar. By default, the separator character is understood to be whitespace, and the target field for the match is the first field in each record .

Rather than implement a complicated set of flags to control field selection, etc., the TclX designers allowed the user to specify compareProc, which is the name of a Tcl procedure used to evaluate each line read from the file ID against the match value. compareProc is given two arguments : keyString, and the line just read; it must return either a 0 (line matches keyString value) or a number less than or greater than 0 if the relevant part of the line is less than or greater than the keyString value. This gives you considerable flexibility in turning bsearch to your own purposes.

TclX offers its own version of the dup system call:

	dup fileID ?targetFileID?

An open file fileID is duplicated; the new file ID is returned, or targetFileID is opened addressing the same file as fileID. If targetFileID is specified, it would normally be stdin, stdout, or stderr, and the dup command takes care of the flush and close for you. (We should perhaps note here that stdin, stdout, and stderr are not Tcl file IDs, but magic keywords which are interpreted by puts, gets and other commands as the file IDs for those file IDs. You could use file0, file1, and file2 instead, but the keywords are easier to remember.)

The fcntl call likewise is implemented as a simple command:

	fcntl fileID flagName ?valueInt?

If no valueInt is specified, then fcntl returns the current setting of the fcntl flag flagName for fileID; if a valueInt is specified then TclX tries to set the fcntl flag to that value.

tcl>fcntl stdin READ
1
tcl>fcntl stdin WRITE
1
tcl>fcntl stdin WRONLY
0
tcl>set fd [open test.file r]
tcl>echo $fd
file3
tcl>fcntl $fd WRITE
0
tcl>fcntl $fd LINEBUF
0
tcl>fcntl $fd LINEBUF 1
tcl>fcntl $fd LINEBUF
1
tcl>fcntl $fd NOBUF 1

The NOBUF option is probably the most useful for most people, permitting you to turn off buffering on the target file ID.

flock and funlock are supported, to lock both entire files and ranges of bytes within files:

	flock -read|write ?-nowait? fileIDtor ?startByte? \
		?lengthBytes? ?originKeyWord?

The -nowait option prevents blocking on failure to get the lock. If the -nowait option is used, the return value is 1 for success and 0 for failure (file is already locked). If this option is not used, then the flock command hangs if the file is already locked. startByte and lengthBytes are two optional integers describing the section to be locked, startByte being the offset from an origin specified by originKeyWord. originKeyWord can be start (the default), current, or end.

funlock takes the same arguments but without the options:

	funlock fileIDtor ?startByte? ?endByte? ?originKeyWord?

Here's a simple example:

tcl>set fd [open test.file w]
tcl>echo $fd
file3
tcl>flock -write $fd
tcl>

Now, from another process on the same system:

tcl> set fd [open test.file a]
tcl> if {![flock -write -nowait $fd]} {echo File is locked, oops.}
File is locked, oops.
tcl>

fstat uses the fstat system call, in two forms: one queries a particular file status flag, the other returns an array containing the settings of all the supported flags. The syntax is

	fstat fileID statusFlag
	fstat fileID stat arrayName

For example:

tcl>set fd [open /dev/null r]
tcl>fstat $fd type
characterSpecial
tcl>fstat $fd nlink
1
tcl>fstat $fd size
0
tcl>set fd [open /etc/hosts r]
tcl>fstat $fd uid
3
tcl>fstat $fd gid
4
tcl>fstat $fd size
2327
tcl>fmtclock [fstat $fd mtime]
Tue May 09 09:45:43 PDT 1995
tcl>fstat $fd type
file
tcl>fstat $fd stat hostat
tcl>array names hostat
tty type size mtime ino dev atime uid ctime nlink gid mode
tcl>echo $hostat(ino)
3078
tcl>echo $hostat(nlink)
1
tcl>

The pipe command uses the pipe system call:

	pipe ?readVar writeVar?

It creates a pipe and (if the arguments are omitted) returns a list containing the read and write file ID of the pipe. If the var names are supplied, readVar is set to the file ID for the read side of the pipe and writeVar to the write side. We can now continue our discussion of communication with child processes, with a more elaborate example using pipes (code courtesy of R. Stover, UCO/Lick). The parent process creates some pipes:

# Run the command given as an argument.  The return value is a
# list containing: 1) The pid of the command, 2) The handle to write
# to (this is connected to the commands standard input), and 3) The
# handle to read from (this is connected to the commands standard output).
#@package: RunChild RunProcess
proc RunProcess {cmd} {
#	First create pipes for communications
	pipe MyInPipe ChildOutPipe
	pipe ChildInPipe MyOutPipe
#	Make them all non-buffered
	fcntl $MyOutPipe NOBUF 1
	fcntl $MyInPipe NOBUF 1
	fcntl $ChildOutPipe NOBUF 1
	fcntl $ChildInPipe NOBUF 1
#	Go spawn the program we will talk to
	set childPid [ChildProcess $cmd $ChildInPipe $ChildOutPipe]
#	Close the unused sides of the pipes
	close $ChildInPipe
	close $ChildOutPipe
#	Return the pid of the child and our input and output pipes
	return "$childPid $MyInPipe $MyOutPipe"
}

The code which creates the child process looks like this:

#  This process executes a command "cmd" with the standard input and
#  standard output connected to pipes.
proc ChildProcess {cmd InPipe OutPipe} {
    if {[set childPid [fork]] == 0} {
#       The child does these things
        upvar MyInPipe ParentInPipe MyOutPipe ParentOutPipe
#       Make the input pipe the standard input
        dup $InPipe stdin
        close $InPipe
#       Make the output pipe the standard output
        dup $OutPipe stdout
        close $OutPipe
#       Close the other ends of the pipes
        close $ParentInPipe
        close $ParentOutPipe
#       Overlay ourself with the desired command
        execl $cmd
    }
    return $childPid
}

>Here we set up a child process to execute the Unix command cmd, connecting the child's stdin and stdout to a couple of pipes set up in the calling routine before ChildProcess was called. The fork command establishes the new process; if the return value is 0 then we are the child process, and we attach our stdin and stdout to the parent's in and out pipes and execl the command that was passed to us as cmd. If we are the parent, we return to the calling routine with the PID of the child.

To make your file ID tools complete, TclX uses the select system call (so you can tell whether your pipes are readable, among other things). It's fairly simple, compared to the select function in C:

	select readList ?writeList? ?exceptList? ?timeoutSec?

You can wait on zero or more file handles (IDs) to be "ready" in each of three categories: ready for read, ready for write, and having an exceptional condition pending. You provide a set of file IDs in the form of three lists corresponding to these categories; these file IDs are checked, with a timeout wait determined by the floating point timeoutSec argument. The return value is a three-item list, each item being the list of files found to be "ready" in one category. (This is another good example of the use of lists for passing structured data!) If you skip the "write" and "exception" lists, select will just check for files ready for reading.

Here's a possible invocation of this command:

tcl> select {file5 file6 file7 file 8} {file4 file9} {file10} 30.5

and the return value might be something like:

{file6 file8} {} {file10}

meaning that file IDs 6 and 8 are ready for reading, nothing is ready for writing, and file10 has an exception condition. If no file IDs had been ready for any operation, this select command could have taken up to 30.5 seconds to return. However, if we assume that file6 and file8 were readable at the time we issued the select command, then it would have returned immediately as shown.