Guru's notes on Unix, SCM, PHP and Perl: 2010

Friday, December 31, 2010

How to Extract metadata from audio files via command line using Perl ?

$ cat test.pl  
use MP3::Tag; 
# set filename of MP3 track 
my $filename = "test.mp3"; 
# create new MP3-Tag object 
my $mp3 = MP3::Tag->new($filename); 
# get tag information 
my ($title, $track, $artist, $album, $comment, $year, $genre) = $mp3->autoinfo(); 
print "$title, $track, $artist, $album, $comment, $year, $genre\n";  

$ perl test.pl 

Nakka Mukka, 6/9, Chinnaponnu & Nakulan, Kadhalil Vizhundhen, Digitised by Guru Ashok, 2008, Tamil

How to install a module from CPAN?

The easiest way is to have a module also named CPAN do it for you. This module comes with perl version 5.004 and later.

$ perl -MCPAN -e shell
  cpan shell -- CPAN exploration and modules installation(v1.59_54)        
  ReadLine support enabled      
cpan> install Some::Module

To manually install the CPAN module, or any well-behaved CPAN module for that matter, follow these steps:

Unpack the source into a temporary area.

perl Makefile.PL     
make     
make test     
make install

Wednesday, December 29, 2010

PHP and XML - Create, Add, Edit, Modify using DOM , SimpleXML

Over the last few working days, I spent a quite a bit of time playing around with XML. While searching through the net, I found few comprehensive PHP XML guides. There never was a ’1 stop all operations’ guide for learning XML.

As such I decided to club together examples of all kinds of operations I ever did on XML in a single post. I hope it benefits others out there who wish to learn more about XML manipulation.

Note : Since the post got quite large, I decided to only use the Tree Map style parsers – DOM & Simple XML.

Operations Performed:

(1) Create XML OR Array to XML Conversion OR CDATA Element Eg

(2) Edit XML – Edit/Modify Element Data (accessed serially)

(3) Edit XML – Edit specific Elements (accessed conditionally)

(4) Edit XML – Element Addition (to queue end)

(5) Edit XML – Element Addition (to queue start)

(6) Edit XML – Element Addition (before a specific node)

(7) Delete Elements (accessed serially)

(8) Delete Elements (accessed conditionally)

(9) Rearrange / Reorder Elements

(10) Display Required data in XML Form itself OR Remove all children nodes save one OR Copy/Clone Node Eg OR Compare/Search non numeric data (like date or time) to get result.

library.xml will be used in all operations.

ps : I have added the indention & spaces outside the tags in the below xml for a presentable xml form.

Remove them before saving your xml file else most of the usual XML functions wont work in the desired manner.

<?xml version="1.0"?>
<library>
    <book isbn="1001" pubdate="1943-01-01">
        <title><![CDATA[The Fountainhead]]></title>
        <author>Ayn Rand</author>
        <price>300</price>
    </book>
    <book isbn="1002" pubdate="1954-01-01">
        <title><![CDATA[The Lord of the Rings]]></title>
        <author>J.R.R.Tolkein</author>
        <price>500</price>
    </book>
    <book isbn="1003" pubdate="1982-01-01">
        <title><![CDATA[The Dark Tower]]></title>
        <author>Stephen King</author>
        <price>200</price>
    </book>
</library>

#######################################

// (1) Create XML OR

Array to XML Conversion OR

CDATA Element Eg

#######################################

// (i) SimpleXML :

// Cant create CDATA element for title in SimpleXML.
function fnSimpleXMLCreate()
    {
        $arr = array(array('isbn'=>'1001', 'pubdate'=>'1943-01-01', 'title'=>'The Fountainhead',
                               'author'=>'Ayn Rand', 'price'=>'300'),
                         array('isbn'=>'1002', 'pubdate'=>'1954-01-01',
                               'title'=>'The Lord of the Rings', 'author'=>'J.R.R.Tolkein',
                               'price'=>'500'),
                         array('isbn'=>'1003', 'pubdate'=>'1982-01-01', 'title'=>'The Dark Tower',
                               'author'=>'Stephen King', 'price'=>'200'));

        $library = new SimpleXMLElement('<library />');

        for($i=0;$i<3;$i++)
        {
            $book = $library->addChild('book');
            $book->addAttribute('isbn', $arr[$i]['isbn']);
            $book->addAttribute('pubdate', $arr[$i]['pubdate']);
            $book->addChild('title', $arr[$i]['title']); //cant create CDATA in SimpleXML.
            $book->addChild('author', $arr[$i]['author']);
            $book->addChild('price', $arr[$i]['price']);
        }

        $library->asXML('library.xml');
    }

// (ii) DOM :

function fnDomCreate()
    {
       $arr = array(array('isbn'=>'1001', 'pubdate'=>'1943-01-01', 'title'=>'The Fountainhead',
                               'author'=>'Ayn Rand', 'price'=>'300'),
                         array('isbn'=>'1002', 'pubdate'=>'1954-01-01',
                               'title'=>'The Lord of the Rings', 'author'=>'J.R.R.Tolkein',
                               'price'=>'500'),
                         array('isbn'=>'1003', 'pubdate'=>'1982-01-01', 'title'=>'The Dark Tower',
                               'author'=>'Stephen King', 'price'=>'200'));

        $dom = new DOMDocument();
        $library = $dom->createElement('library');
        $dom->appendChild($library);

        for($i=0;$i<3;$i++)
        {
            $book = $dom->createElement('book');
            $book->setAttribute('isbn',$arr[$i]['isbn']);
             $book->setAttribute('pubdate',$arr[$i]['pubdate']);

            //$prop = $dom->createElement('title', $arr[$i]['title']);
            $prop = $dom->createElement('title');
            $text = $dom->createCDATASection($arr[$i]['title']);
            $prop->appendChild($text);
            $book->appendChild($prop);

            $prop = $dom->createElement('author', $arr[$i]['author']);
            $book->appendChild($prop);
            $prop = $dom->createElement('price', $arr[$i]['price']);
            $book->appendChild($prop);
            $library->appendChild($book);
        }
        //header("Content-type: text/xml");
        $dom->save('library.xml');
    }

#######################################

// (2) Edit XML – Edit/Modify Element Data (accessed serially)

#######################################

// (i) SimpleXML :

// Edit Last Book Title
function fnSimpleXMLEditElementSeq()
    {
        $library = new SimpleXMLElement('library.xml',null,true);
        $num = count($library);
        $library->book[$num-1]->title .= ' - The Gunslinger';
        header("Content-type: text/xml");
        echo $library->asXML();
    }

// (ii) DOM :

//Edit Last Book Title
    function fnDOMEditElementSeq()
    {
        $dom = new DOMDocument();
        $dom->load('library.xml');
        $library = $dom->documentElement;
        $cnt = $library->childNodes->length;

        $library->childNodes->item($cnt-1)->getElementsByTagName('title')->item(0)->nodeValue .= ' Series';
       // 2nd way #$library->getElementsByTagName('book')->item($cnt-1)->getElementsByTagName('title')->item(0)->nodeValue .= ' Series';

       //3rd Way
       //$library->childNodes->item($cnt-1)->childNodes->item(0)->nodeValue .= ' Series';
        header("Content-type: text/xml");
        echo $dom->saveXML();
    }

#######################################

// (3) Edit XML – Edit specific Elements (accessed conditionally)

#######################################

// (i) SimpleXML :

//Edit Title of book with author J.R.R.Tolkein
    function fnSimpleXMLEditElementCond()
    {
        $library = new SimpleXMLElement('library.xml',null,true);
        $book = $library->xpath('/library/book[author="J.R.R.Tolkein"]');
        $book[0]->title .= ' Series';
        header("Content-type: text/xml");
        echo $library->asXML();
    }

// (ii) DOM (with XPath):

 //Edit Title of book with author J.R.R.Tolkein
    function fnDOMEditElementCond()
    {
        $dom = new DOMDocument();
        $dom->load('library.xml');
        $library = $dom->documentElement;
        $xpath = new DOMXPath($dom);
        $result = $xpath->query('/library/book[author="J.R.R.Tolkein"]/title');
        $result->item(0)->nodeValue .= ' Series';
        // This will remove the CDATA property of the element.
        //To retain it, delete this element (see delete eg) & recreate it with CDATA (see create xml eg).

        //2nd Way
        //$result = $xpath->query('/library/book[author="J.R.R.Tolkein"]');
       // $result->item(0)->getElementsByTagName('title')->item(0)->nodeValue .= ' Series';
        header("Content-type: text/xml");
        echo $dom->saveXML();

    }

#######################################

// (4) Edit XML – Element Addition (to queue end)

#######################################

// (i) SimpleXML :

//Add another Book to the end
    function fnSimpleXMLAddElement2End()
    {
        $library = new SimpleXMLElement('library.xml',null,true);
        $book = $library->addChild('book');
        $book->addAttribute('isbn', '1004');
        $book->addAttribute('pubdate', '1960-07-11');
        $book->addChild('title', "To Kill a Mockingbird");
        $book->addChild('author', "Harper Lee");
        $book->addChild('price', "100");
        header("Content-type: text/xml");
        echo $library->asXML();
    }

// (ii) DOM :

    //Add another Book to the end
    function fnDOMAddElement2End()
    {
        $dom = new DOMDocument();
        $dom->load('library.xml');
        $library = $dom->documentElement;

        $book = $dom->createElement('book');
        $book->setAttribute('isbn','1000');
        $book->setAttribute('pubdate','1960-07-11');

        $prop = $dom->createElement('title');
        $text = $dom->createTextNode('To Kill a Mockingbird');
        $prop->appendChild($text);
        $book->appendChild($prop);

         $prop = $dom->createElement('author','Harper Lee');
        $book->appendChild($prop);
        $prop = $dom->createElement('price','100');
        $book->appendChild($prop);

        $library->appendChild($book);
        header("Content-type: text/xml");
        echo $dom->saveXML();
    }

#######################################

//(5) Edit XML – Element Addition (to queue start)

#######################################

// (i) SimpleXML :

// Add a Book to List Start
// Insert Before Functionality not present in SimpleXML
// We can integrate DOM with SimpleXML to do it.
    function fnSimpleXMLAddElement2Start()
    {
        $libSimple = new SimpleXMLElement('library.xml',null,true);
        $libDom = dom_import_simplexml($libSimple);

        $dom = new DOMDocument();
        //returns a copy of the node to import
        $libDom = $dom->importNode($libDom, true);
        //associate it with the current document.
        $dom->appendChild($libDom);

        fnDOMAddElement2Start($dom); //see below DOM function
    }

// (ii) DOM :

function fnDOMAddElement2Start($dom='')
    {
        if(!$dom)
        {
            $dom = new DOMDocument();
            $dom->load('library.xml');
        }
        $library = $dom->documentElement;
        #var_dump($library->childNodes->item(0)->parentNode->nodeName);
        $book = $dom->createElement('book');
        $book->setAttribute('isbn','1000');
        $book->setAttribute('pubdate','1960-07-11');

        $prop = $dom->createElement('title','To Kill a Mockingbird');
        $book->appendChild($prop);
         $prop = $dom->createElement('author','Harper Lee');
        $book->appendChild($prop);
         $prop = $dom->createElement('price','100');
        $book->appendChild($prop);

        $library->childNodes->item(0)->parentNode->insertBefore($book,$library->childNodes->item(0));
        header("Content-type: text/xml");
        echo $dom->saveXML();
    }

#######################################

// (6) Edit XML – Element Addition (before a specific node)

#######################################

// (i) SimpleXML :

// Add a Book Before attribute isbn 1002
    // Insert Before Functionality not present in SimpleXML
    // We can integrate DOM with SimpleXML to do it.
    function fnSimpleXMLAddElementCond()
    {
        $libSimple = new SimpleXMLElement('library.xml',null,true);
        $libDom = dom_import_simplexml($libSimple);

        $dom = new DOMDocument();
        //returns a copy of the node to import
        $libDom = $dom->importNode($libDom, true);
        //associate it with the current document.
        $dom->appendChild($libDom);

        fnDOMAddElementCond($dom); //see below DOM eg.
    }

// (ii) DOM :

// Add a Book Before isbn 1002
    function fnDOMAddElementCond($dom='')
    {
        if(!$dom)
        {
            $dom = new DOMDocument();
            $dom->load('library.xml');
        }
        $library = $dom->documentElement;

        $book = $dom->createElement('book');
        $book->setAttribute('isbn','1000');
        $book->setAttribute('pubdate', '1960-07-11');

        $prop = $dom->createElement('title','To Kill a Mockingbird');
        $book->appendChild($prop);
         $prop = $dom->createElement('author','Harper Lee');
        $book->appendChild($prop);
        $prop = $dom->createElement('price','100');
        $book->appendChild($prop);

        $xpath = new DOMXPath($dom);
        $result = $xpath->query('/library/book[@isbn="1002"]');
        $library->childNodes->item(0)->parentNode->insertBefore($book,$result->item(0));
        header("Content-type: text/xml");
        echo $dom->saveXML();
    }

#######################################

// (7) Delete Elements (accessed serially)

#######################################

// (i) SimpleXML :

// Delete 2nd book
    function fnSimpleXMLDeleteSeq()
    {
        $library = new SimpleXMLElement('library.xml',null,true);
        //$library->book[1] = null; // this only empties content
        unset($library->book[1]);
        header("Content-type: text/xml");
        echo $library->asXML();

    }

// (ii) DOM :

// Delete 2nd Book
    function fnDOMDeleteSeq()
    {
        $dom = new DOMDocument();
        $dom->load('library.xml');
        $library = $dom->documentElement;

        $library->childNodes->item(0)->parentNode->removeChild($library->childNodes->item(1));

        header("Content-type: text/xml");
        echo $dom->saveXML();
    }

#######################################

// (8) Delete Elements (accessed conditionally)

#######################################

// (i) SimpleXML :

// Delete a book with  200<price<500
    // Not possible to delete node found via XPath in SimpleXML. See below.
    function fnSimpleXMLDeleteCond()
    {
        $library = new SimpleXMLElement('library.xml',null,true);
        $book = $library->xpath('/library/book[price>"200" and price<"500"]');

        //Problem here....not able to delete parent node using unset($book[0]);
        // unset of parent node only works when accessing serially. eg : unset($library->book[0]);

        //header("Content-type: text/xml");
        //echo $library->asXML();

    }

// (ii) DOM :

// Delete the book with  200<price<500
    function fnDOMDeleteCond()
    {
        $dom = new DOMDocument();
        $dom->load('library.xml');
        $library = $dom->documentElement;
        $xpath = new DOMXPath($dom);
        $result = $xpath->query('/library/book[price>"200" and price<"500"]');
        $result->item(0)->parentNode->removeChild($result->item(0));
        header("Content-type: text/xml");
        echo $dom->saveXML();
    }

#######################################

// (9) Rearrange / Reorder Elements

#######################################

// (i) SimpleXML :

 // Exchange Position of 2nd book with 3rd.
// Due to absence of an inbuilt function (DOM has it), we have to make our own function in SimpleXML. Better to use DOM.
    function fnSimpleXMLRearrange()
    {
         $library = new SimpleXMLElement('library.xml',null,true);
         //$library->book[3] = $library->book[0]; // this doesnt work

         $cnt = count($library);
         // Custom function which basically uses a 3rd container to exchange nodes data.
         fnNodeExchange($library,2,1);
         //$library->book[2] = $temp;
         header("Content-type: text/xml");
        echo $library->asXML();
    }
function fnNodeExchange(&$library,$node1,$node2)
    {
        $cnt = count($library);

        foreach($library->book[$node1]->children() as $book)
         {
            $name = $book->getName();
            $library->book[$cnt]->$name = $book[0];
         }
         foreach($library->book[$node1]->attributes() as $book)
         {
            $name = $book->getName();
            $library->book[$cnt][$name] = $book[0];
         }
         foreach($library->book[$node2]->children() as $book)
         {
            $name = $book->getName();
            $library->book[$node1]->$name = $book[0];
         }
         foreach($library->book[$node2]->attributes() as $book)
         {
            $name = $book->getName();
            $library->book[$node1][$name] = $book[0];
         }
         if($node2!=($cnt-1)){
            foreach($library->book[$cnt]->children() as $book)
            {
               $name = $book->getName();
               $library->book[$node2]->$name = $book[0];
            }
            foreach($library->book[$cnt]->attributes() as $book)
            {
               $name = $book->getName();
               $library->book[$node2][$name] = $book[0];
            }
            unset($library->book[$cnt]);
         }
         else {
            unset($library->book[$cnt-1]);
         }
    }

// (ii) DOM :

// Exchange Position of 2nd book with 3rd.
    function fnDOMRearrange()
    {
        $dom = new DOMDocument();
        $dom->load('library.xml');
        $library = $dom->documentElement;
        $library->childNodes->item(0)->parentNode->insertBefore($library->childNodes->item(2),$library->childNodes->item(1));
        header("Content-type: text/xml");
        echo $dom->saveXML();
    }

#######################################

// (10) Display Required data in XML Form itself OR

Remove all children nodes save one OR

Copy/Clone Node Eg OR

Compare/Search non numeric data (like date or time) to get result.

#######################################

// (i) SimpleXML :

// Display Books published after 1980 in XML Form itself.
// No function to copy node directly in SimpleXML.
// Its simpler for this functionality to be implemented in DOM.
    function fnSimpleXMLDisplayElementCond()
    {
        $library = new SimpleXMLElement('library.xml',null,true);
        $book = $library->xpath('/library/book[translate(@pubdate,"-","")>translate("1980-01-01","-","")]');
        // Manually create a new structure then add searched data to it (see create xml eg.)
    }

// (ii) DOM :

// Display Books published after 1980 in XML Form itself.
    function fnDOMDisplayElementCond()
    {
        $dom = new DOMDocument();
        $dom->load('library.xml');
        $library = $dom->documentElement;
        $xpath = new DOMXPath($dom);

        // Comparing non numeric standard data
        $result = $xpath->query('/library/book[translate(@pubdate,"-","")>translate("1980-01-01","-","")]');
        // For simpler search paramater use this :
        //$result = $xpath->query('/library/book[author="J.R.R.Tolkein"]');

        // Copy only node & its attributes not its contents.
        $library = $library->cloneNode(false);
        // Add the 1 element which is search result.
        $library->appendChild($result->item(0));

        header("Content-type: text/xml");
        echo $dom->saveXML($library);
    }

Lessons Learn’t :

SimpleXML is fantastic for those who will only briefly flirt with XML (or beginners) & perform simple operations on XML.
DOM is an absolute necessity for performing complex operations on XML data. Its learning curve is higher than SimpleXML off course but once you get the hang of it , you will know its very logical.
Use XPath for conditional access to data. For serial access (like last book) XPath is not needed (but u can use it) since I can use normal DOM / SimpleXML node access.

Tuesday, December 21, 2010

Unix Terminal Color Code

Unix Color Codes:

0 Normal text, foreground and background
1 Bold text
4 Underline
5 Blink
7 Inverse

30 Black foreground
31 Red foreground
32 Green foreground
33 Yellow foreground
34 Blue foreground
35 Magenta foreground
36 Cyan foreground
37 White foreground

40 Black background
41 Red background
42 Green background
43 Yellow background
44 Blue background
45 Magenta background
46 Cyan background
47 White background

[;;m

echo "[1;33;44m Hello, world [0m"
printf "\e[1;37;41m Hello, world \e[m\n"

Print all colors:

for j in 0 1 4 5 7
do
for i in 30 31 32 33 34 35 36 37
do
echo "^[[$j;${i}m Hello ^[[0m"
done
echo
done

Result

CR/LF Issues and Text Line-endings - Perforce

SUMMARY

How does Perforce handle CR/LF issues?
How does Perforce translate text file line-endings?

DETAILS

When editing text files in cross-platform environments, you must account for different line termination conventions. Perforce can be configured to automatically translate line-endings from one operating system's convention to another, or configured to ignore line-ending translation. These configurations apply only to text files.

Platform Conventions

The following are the various line termination conventions:

On UNIX, text file line-endings are terminated with a newline character (ASCII 0x0a, represented by the \n escape sequence in most languages), also referred to as a linefeed (LF).
On Windows, line-endings are terminated with a combination of a carriage return (ASCII 0x0d or \r) and a newline(\n), also referred to as CR/LF.
On the Mac Classic (Mac systems using any system prioer to Mac OS X), line-endings are terminated with a single carriage return (\r or CR). (Mac OS X uses the UNIX convention.)

The following example files demonstrate the various line-end conventions. They are displayed using the UNIX tool od (octal dump) on a Windows machine:

D:P4WORKtest>od -c line_end.pc
0000000000 P C l i n e e n d \r \n 
0000000015  
D:P4WORKtest>od -c line_end.mac 
0000000000 M a c l i n e e n d \r 
0000000015  
D:P4WORKtest>od -c line_end.unix
 0000000000 U n i x l i n e e n d \n 
0000000016

Current Versions of Perforce

On the server side, Perforce processes all text files using Unix-style LF line-endings. Although Perforce stores server archive files on disk in the operating system's native line termination convention (CR/LF on Windows, LF on Unix), all line-endings are normalized to Unix-style LF line-endings for internal Perforce Server operations such as p4 sync, p4 submit and p4 diff.

On the client workspace side, Perforce handling of line-endings is determined by a global option for each clientspec. When you sync text files to a client workspace with p4 sync, or submit them back to a Perforce Server with p4 submit, their line-endings are converted as specified in the clientspec LineEnd section.

Beginning with the 2001.1 version of Perforce, there are five possible workspace options for handling text file line-endings. These options for line-end treatment are:

local        The use mode native to the client (default).
unix         Linefeed: UNIX style.  
mac          Carriage return: Macintosh style.  
win          Carriage return-linefeed: Windows style.  
share        Hybrid: writes UNIX style but reads UNIX, Mac, or Windows style.

The default value for all Perforce client workspaces is local, meaning that files sync to the client workspace using the client platform's standard line-ending characters. So the default LineEnd section of the clientspec would show the following:

LineEnd:    local

On UNIX and Mac OS X client workspaces, the default local setting does not cause any line-end conversion. Perforce client workspaces on UNIX store text files with LF line-endings. Because the Perforce Server uses LF line-endings for operations involving text files, there is no need to do any line-end conversion in this case.

By contrast, syncing files to a Windows or Macintosh workspace requires line-end conversion, because those operating system's native line termination formats are different from UNIX. In these cases, using the local setting converts LF to CR/LF in the Windows workspace and LF to CR in the Macintosh workspace. When files are submitted back to the Perforce Server, the line-endings are converted back to LF.

The Perforce line-end options can be used to convert your text file line endings regardless of the platform where your client workspace resides. For example, a Mac Classic user can set their client workspace line-end option to win, to sync text files to their workspace and retain Windows-style CR/LF line-endings. UNIX users can create client workspace files with Macintosh CR line termination by choosing the mac line-end option and then syncing files into their workspace.

Using the unix client workspace option on a UNIX or Mac OS X client is equivalent to using the local setting. Similarly, the local setting for a Windows workspace is equivalent to win, and the local setting for a Mac Classic workspace is equivalent to mac. Again, the local setting is equivalent to the operating system's native line termination convention.

You might have files in your workspace that have mixed line termination conventions. For example, you might work in a cross platform environment and use a text editor that can save files with multiple line-ending formats. In this case, it is possible to edit files and inadvertently save them with spurious line-end characters. For example, saving a text file with CRLF line-endings in a unix workspace and then submitting it results in the files being stored in the depot with extra CR characters at the end of each line. When these files are synced to other unix workspaces, they will have CRLF line-endings rather than the correct LF line-endings, and when these files are synced to win workspaces, they will have CRCRLF line-endings (since each LF in the original file is converted to CRLF).

Here, the share option is useful. The share option is used to "normalize" mixed line-endings into UNIX line-end format. The share option does not affect files that are synced into a client workspace, however, when files are submitted back to the Perforce Server, the share option converts all Windows-style CRLF line-endings and all Mac-style CR line-endings to the UNIX-style LF, leaving lone LFs untouched.

For more information on the current LineEnd options see the p4 client section of the command reference.

Previous Versions of Perforce (99.1 to 2000.1)

Perforce clientspecs have a single client workspace option, [no]crlf, that toggles line-ending translation on and off for all files on Windows and Macintosh clients. On UNIX clients, this setting is ignored.

The default value on both Windows and Mac Classic clients is crlf. The crlf option enables line-end translation using the operating system's default line termination convention -- CR for Mac Classic text files, CR/LF for Windows text files.

To override the default CR/LF translation behavior you set the clientspec option to nocrlf. In this case, line-end translation is ignored when files are retrieved from, or submitted to, the Perforce Server. This setting is useful in instances where you want to preserve UNIX-style line-endings in a Windows client workspace. For example, if you were using UNIX shell tools on Windows or mounting NFS drives on a Windows based machine, preserving the UNIX-style line-endings would be preferable. In these cases, your text editor is a factor. Some Windows editors only save files with CR/LF endings, while others can save files in either PC, UNIX or Mac line-end format. As an example, if your client workspace is set to ignore line-end translation (nocrlf), and your text editor saved files in Windows format (CR/LF), then your files will contain extra carriage returns when submitted back to the server. When such files are then synced to a UNIX client workspace, they contain spurious ^M (Control-M) characters at the end of lines. To avoid this, you must save text files in the correct line-end format when using the old nocrlf clientspec option.

An alternative to setting the old nocrlf option is to treat a file as type binary. This preserves whatever line termination style the file is saved with, because line-end translation is ignored for binary files. However, this configuration also disables all other Perforce text-specific features for that file, including RCS reverse-delta storage and three-way merging capability.

Previous Versions of Perforce (98.2 and earlier)

The crlf or local translation option is implicit and cannot be altered.

Note: It is possible to add text files to the Perforce repository as type binary or binary+D files in order to bypass line-ending translation for those files. If you do add text files as Perforce type binary, you will need to use the -t flag when diffing or merging in order to treat the files as text.

Monday, December 6, 2010

Bits to Bytes to Kilobytes to Megabytes to Gigabytes to Terabytes to Petabytes to Exabytes

The basic unit used in computer data storage is called a bit (binary digit). Computers use these little bits, which are composed of ones and zeros, to do things and talk to other computers. All your files, for instance, are kept in the computer as binary files and translated into words and pictures by the software (which is also ones and zeros). This two number system, is called a "binary number system" since it has only two numbers in it. The decimal number system in contrast has ten unique digits, zero through nine.

But although computer data and file size is normally measured in binary code using the binary number system (counted by factors of two 1, 2, 4, 8, 16, 32, 64, etc), the prefixes for the multiples are based on the metric system! The nearest binary number to 1,000 is 2^10 or 1,024; thus 1,024 bytes was named a Kilobyte. So, although a metric "kilo" equals 1,000 (e.g. one kilogram = 1,000 grams), a binary "Kilo" equals 1,024 (e.g. one Kilobyte = 1,024 bytes). Not surprisingly, this has led to a great deal of confusion.

In December 1998, the International Electrotechnical Commission (IEC) approved a new IEC International Standard. Instead of using the metric prefixes for multiples in binary code, the new IEC standard invented specific prefixes for binary multiples made up of only the first two letters of the metric prefixes and adding the first two letters of the word "binary". Thus, for instance, instead of Kilobyte (KB) or Gigabyte (GB), the new terms would be kibibyte (KiB) or gibibyte (GiB). The new IEC International Standards, which are not commonly used yet, are included below.

Here's a few more details to consider:

Although data storage capacity is generally expressed in binary code, many hard drive manufacturers (and some newer BIOSs) use a decimal system to express capacity.
- For example, a 30 gigabyte drive is usually 30,000,000,000 bytes (decimal) not the 32,212,254,720 binary bytes you would expect.
Another trivial point is that in the metric system the "k" or "kilo" prefix is always lowercase (i.e. kilogram = kg not Kg) but since these binary uses for data storage capacity are not properly metric, it has become standard to use an uppercase "K" for the binary form.
When used to describe Data Transfer Rate, bits/bytes are calculated as in the metric system
- Kilobits per second is usually shortened to kbps or Kbps. Although technically speaking, the term kilobit should have a lowercase initial letter, it has become common to capitalize it in abbreviation (e.g. "56 Kbps" or "56K"). The simple "K" might seem ambiguous but, in the context of data transfer, it can be assumed that the measurement is in bits rather than bytes unless indicated otherwise.

File Storage Capacity by Bits and Bytes
	bit	byte	Kilobyte	Megabyte	Gigabyte
bit	1	8	8,192	8,388,608	8,589,934,592
byte	8	1	1,024	1,048,576	1,073,741,824
Kilobyte	8,192	1,024	1	1,024	1,048,576
Megabyte	8,388,608	1,048,576	1,024	1	1,024
Gigabyte	8,589,934,592	1,073,741,824	1,048,576	1,024	1
Terabyte	8,796,093,022,208	1,099,511,627,776	1,073,741,824	1,048,576	1,024
Petabyte	9,007,199,254,740,992	1,125,899,906,842,624	1,099,511,627,776	1,073,741,824	1,048,576
Exabyte	9,223,372,036,854,775,808	1,152,921,504,606,846,976	1,125,899,906,842,624	1,099,511,627,776	1,073,741,824
Zettabyte	9,444,732,965,739,290,427,392	1,180,591,620,717,411,303,424	1,152,921,504,606,846,976	1,125,899,906,842,624	1,099,511,627,776

File Storage Capacity by Powers of Two (Base 2)
	bit	byte	Kilobyte	Megabyte	Gigabyte	Terabyte	Petabyte	Exabyte	Zettabyte	Yottabyte
bit	2^0	2^3	2^13	2^23	2^33	2^43	2^53	2^63	2^73	2^83
byte	2^3	2^0	2^10	2^20	2^30	2^40	2^50	2^60	2^70	2^80
Kilobyte	2^13	2^10	2^0	2^10	2^20	2^30	2^40	2^50	2^60	2^70
Megabyte	2^23	2^20	2^10	2^0	2^10	2^20	2^30	2^40	2^50	2^60
Gigabyte	2^33	2^30	2^20	2^10	2^0	2^10	2^20	2^30	2^40	2^50
Terabyte	2^43	2^40	2^30	2^20	2^10	2^0	2^10	2^20	2^30	2^40
Petabyte	2^53	2^50	2^40	2^30	2^20	2^10	2^0	2^10	2^20	2^30
Exabyte	2^63	2^60	2^50	2^40	2^30	2^20	2^10	2^0	2^10	2^20
Zettabyte	2^73	2^70	2^60	2^50	2^40	2^30	2^20	2^10	2^0	2^10
Yottabyte	2^83	2^80	2^70	2^60	2^50	2^40	2^30	2^20	2^10	2^0

New IEC Standard
bit	bit	0 or 1
byte	B	8 bits
kibibit	Kibit	1024 bits
kilobit	kbit	1000 bits
kibibyte (binary)	KiB	1024 bytes
kilobyte (decimal)	kB	1000 bytes
megabit	Mbit	1000 kilobits
mebibyte (binary)	MiB	1024 kibibytes
megabyte (decimal)	MB	1000 kilobytes
gigabit	Gbit	1000 megabits
gibibyte (binary)	GiB	1024 mebibytes
gigabyte (decimal)	GB	1000 megabytes
terabit	Tbit	1000 gigabits
tebibyte (binary)	TiB	1024 gibibytes
terabyte (decimal)	TB	1000 gigabytes
petabit	Pbit	1000 terabits
pebibyte (binary)	PiB	1024 tebibytes
petabyte (decimal)	PB	1000 terabytes
exabit	Ebit	1000 petabits
exbibyte (binary)	EiB	1024 pebibytes
exabyte (decimal)	EB	1000 petabytes

Thursday, November 25, 2010

Using Lame in Cygwin to convert mp3s recursively

for ddnm in *
do
cd "$ddnm"
echo "$ddnm"
for mp3file in *.mp3
do
mp3file24="$mp3file".24
if [[ ! -f "$mp3file24" ]]
then
/cygdrive/c/lame/lame.exe -m j -b 24 --resample 11.025 -q 0 "$mp3file" "$mp3file24"
fi
done
cd ..
done

MIME types

.ai - application/postscript
.aif - audio/x-aiff
.aifc - audio/x-aiff
.aiff - audio/x-aiff
.asc - text/plain
.au - audio/basic
.avi - video/x-msvideo
.bcpio - application/x-bcpio
.bin - application/octet-stream
.c - text/plain
.cc - text/plain
.ccad - application/clariscad
.cdf - application/x-netcdf
.class - application/octet-stream
.cpio - application/x-cpio
.class - application/octet-stream
.cpio - application/x-cpio
.cpt - application/mac-compactpro
.csh - application/x-csh
.css - text/css
.dcr - application/x-director
.dir - application/x-director
.dms - application/octet-stream
.doc - application/msword
.drw - application/drafting
.dvi - application/x-dvi
.dwg - application/acad
.dxf - application/dxf
.dxr - application/x-director
.eps - application/postscript
.etx - text/x-setext
.exe - application/octet-stream
.ez - application/andrew-inset
.f - text/plain
.f90 - text/plain
.fli - video/x-fli
.gif - image/gif
.gtar - application/x-gtar
.gz - application/x-gzip
.h - text/plain
.hdf - application/x-hdf
.hh - text/plain
.hqx - application/mac-binhex40
.htm - text/html
.html - text/html
.ice - x-conference/x-cooltalk
.ief - image/ief
.iges - model/iges
.igs - model/iges
.ips - application/x-ipscript
.ipx - application/x-ipix
.jpe - image/jpeg
.jpeg - image/jpeg
.jpg - image/jpeg
.js - application/x-javascript
.kar - audio/midi
.latex - application/x-latex
.lha - application/octet-stream
.lsp - application/x-lisp
.lzh - application/octet-stream
.m - text/plain
.man - application/x-troff-man
.me - application/x-troff-me
.mesh - model/mesh
.mid - audio/midi
.midi - audio/midi
.mif - application/vnd.mif
.mime - www/mime
.mov - video/quicktime
.movie - video/x-sgi-movie
.mp2 - audio/mpeg
.mp3 - audio/mpeg
.mpe - video/mpeg
.mpeg - video/mpeg
.mpg - video/mpeg
.mpga - audio/mpeg
.ms - application/x-troff-ms
.msh - model/mesh
.nc - application/x-netcdf
.oda - application/oda
.pbm - image/x-portable-bitmap
.pdb - chemical/x-pdb
.pdf - application/pdf
.pgm - image/x-portable-graymap
.pgn - application/x-chess-pgn
.png - image/png
.pnm - image/x-portable-anymap
.pot - application/mspowerpoint
.ppm - image/x-portable-pixmap
.pps - application/mspowerpoint
.ppt - application/mspowerpoint
.ppz - application/mspowerpoint
.pre - application/x-freelance
.prt - application/pro_eng
.ps - application/postscript
.qt - video/quicktime
.ra - audio/x-realaudio
.ram - audio/x-pn-realaudio
.ras - image/cmu-raster
.rgb - image/x-rgb
.rm - audio/x-pn-realaudio
.roff - application/x-troff
.rpm - audio/x-pn-realaudio-plugin
.rtf - text/rtf
.rtx - text/richtext
.scm - application/x-lotusscreencam
.set - application/set
.sgm - text/sgml
.sgml - text/sgml
.sh - application/x-sh
.shar - application/x-shar
.silo - model/mesh
.sit - application/x-stuffit
.skd - application/x-koan
.skm - application/x-koan
.skp - application/x-koan
.skt - application/x-koan
.smi - application/smil
.smil - application/smil
.snd - audio/basic
.sol - application/solids
.spl - application/x-futuresplash
.src - application/x-wais-source
.step - application/STEP
.stl - application/SLA
.stp - application/STEP
.sv4cpio - application/x-sv4cpio
.sv4crc - application/x-sv4crc
.swf - application/x-shockwave-flash
.t - application/x-troff
.tar - application/x-tar
.tcl - application/x-tcl
.tex - application/x-tex
.texi - application/x-texinfo
.texinfo - application/x-texinfo
.tif - image/tiff
.tiff - image/tiff
.tr - application/x-troff
.tsi - audio/TSP-audio
.tsp - application/dsptype
.tsv - text/tab-separated-values
.txt - text/plain
.unv - application/i-deas
.ustar - application/x-ustar
.vcd - application/x-cdlink
.vda - application/vda
.viv - video/vnd.vivo
.vivo - video/vnd.vivo
.vrml - model/vrml
.wav - audio/x-wav
.wrl - model/vrml
.xbm - image/x-xbitmap
.xlc - application/vnd.ms-excel
.xll - application/vnd.ms-excel
.xlm - application/vnd.ms-excel
.xls - application/vnd.ms-excel
.xlw - application/vnd.ms-excel
.xml - text/xml
.xpm - image/x-xpixmap
.xwd - image/x-xwindowdump
.xyz - chemical/x-pdb
.zip - application/zip

Disabling right click on whole html

Disabling right click on whole html
===================================
<body oncontextmenu="return false">

Disabling right click on whereever reqd
=======================================
<tr oncontextmenu="return false">

Perl One Liners

Just enough perl to do most everything! Tom Christianson (spelling?)
once posted a canonical list of one line perl programs to do many common
command-line tasks.
It included:
# run contents of "my_file" as a program
perl my_file

# run debugger "stand-alone"
perl -d -e 42

# run program, but with warnings
perl -w my_file

# run program under debugger
perl -d my_file

# just check syntax, with warnings
perl -wc my_file

# useful at end of "find foo -print"
perl -nle unlink

# simplest one-liner program
perl -e 'print "hello world!\n"'

# add first and penultimate columns
perl -lane 'print $F[0] + $F[-2]'

# just lines 15 to 17
perl -ne 'print if 15 .. 17' *.pod

# in-place edit of *.c files changing all foo to bar
perl -p -i.bak -e 's/\bfoo\b/bar/g' *.c

# command-line that prints the first 50 lines (cheaply)
perl -pe 'exit if $. > 50' f1 f2 f3 ...

# delete first 10 lines
perl -i.old -ne 'print unless 1 .. 10' foo.txt

# change all the isolated oldvar occurrences to newvar
perl -i.old -pe 's{\boldvar\b}{newvar}g' *.[chy]

# command-line that reverses the whole file by lines
perl -e 'print reverse <>' file1 file2 file3 ....

# find palindromes
perl -lne 'print if $_ eq reverse' /usr/dict/words

# command-line that reverse all the bytes in a file
perl -0777e 'print scalar reverse <>' f1 f2 f3 ...

# command-line that reverses the whole file by paragraphs
perl -00 -e 'print reverse <>' file1 file2 file3 ....

# increment all numbers found in these files
perl i.tiny -pe 's/(\d+)/ 1 + $1 /ge' file1 file2 ....

# command-line that shows each line with its characters backwards
perl -nle 'print scalar reverse $_' file1 file2 file3 ....

# delete all but lines beween START and END
perl -i.old -ne 'print unless /^START$/ .. /^END$/' foo.txt

# binary edit (careful!)
perl -i.bak -pe 's/Mozilla/Slopoke/g' /usr/local/bin/netscape

# look for dup words
perl -0777 -ne 'print "$.: doubled $_\n" while /\b(\w+)\b\s+\b\1\b/gi'

# command-line that prints the last 50 lines (expensively)
perl -e 'lines = <>; print @@lines[ $#lines .. $#lines-50' f1 f2 f3 ...

AWK One Liners

HANDY ONE-LINE SCRIPTS FOR AWK 30 April 2008
Compiled by Eric Pement - eric [at] pement.org version 0.27

Latest version of this file (in English) is usually at:
http://www.pement.org/awk/awk1line.txt

This file will also be available in other languages:
Chinese - http://ximix.org/translation/awk1line_zh-CN.txt

USAGE:

Unix: awk '/pattern/ {print "$1"}' # standard Unix shells
DOS/Win: awk '/pattern/ {print "$1"}' # compiled with DJGPP, Cygwin
awk "/pattern/ {print \"$1\"}" # GnuWin32, UnxUtils, Mingw

Note that the DJGPP compilation (for DOS or Windows-32) permits an awk
script to follow Unix quoting syntax '/like/ {"this"}'. HOWEVER, if the
command interpreter is CMD.EXE or COMMAND.COM, single quotes will not
protect the redirection arrows (<, >) nor do they protect pipes (|).
These are special symbols which require "double quotes" to protect them
from interpretation as operating system directives. If the command
interpreter is bash, ksh or another Unix shell, then single and double
quotes will follow the standard Unix usage.

Users of MS-DOS or Microsoft Windows must remember that the percent
sign (%) is used to indicate environment variables, so this symbol must
be doubled (%%) to yield a single percent sign visible to awk.

If a script will not need to be quoted in Unix, DOS, or CMD, then I
normally omit the quote marks. If an example is peculiar to GNU awk,
the command 'gawk' will be used. Please notify me if you find errors or
new commands to add to this list (total length under 65 characters). I
usually try to put the shortest script first. To conserve space, I
normally use '1' instead of '{print}' to print each line. Either one
will work.

FILE SPACING:

# double space a file
awk '1;{print ""}'
awk 'BEGIN{ORS="\n\n"};1'

# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
# NOTE: On Unix systems, DOS lines which have only CRLF (\r\n) are
# often treated as non-blank, and thus 'NF' alone will return TRUE.
awk 'NF{print $0 "\n"}'

# triple space a file
awk '1;{print "\n"}'

NUMBERING AND CALCULATIONS:

# precede each line by its line number FOR THAT FILE (left alignment).
# Using a tab (\t) instead of space will preserve margins.
awk '{print FNR "\t" $0}' files*

# precede each line by its line number FOR ALL FILES TOGETHER, with tab.
awk '{print NR "\t" $0}' files*

# number each line of a file (number on left, right-aligned)
# Double the percent signs if typing from the DOS command prompt.
awk '{printf("%5d : %s\n", NR,$0)}'

# number each line of file, but only print numbers if line is not blank
# Remember caveats about Unix treatment of \r (mentioned above)
awk 'NF{$0=++a " :" $0};1'
awk '{print (NF? ++a " :" :"") $0}'

# count lines (emulates "wc -l")
awk 'END{print NR}'

# print the sums of the fields of every line
awk '{s=0; for (i=1; i<=NF; i++) s=s+$i; print s}'

# add all fields in all lines and print the sum
awk '{for (i=1; i<=NF; i++) s=s+$i}; END{print s}'

# print every line after replacing each field with its absolute value
awk '{for (i=1; i<=NF; i++) if ($i < 0) $i = -$i; print }'
awk '{for (i=1; i<=NF; i++) $i = ($i < 0) ? -$i : $i; print }'

# print the total number of fields ("words") in all lines
awk '{ total = total + NF }; END {print total}' file

# print the total number of lines that contain "Beth"
awk '/Beth/{n++}; END {print n+0}' file

# print the largest first field and the line that contains it
# Intended for finding the longest string in field #1
awk '$1 > max {max=$1; maxline=$0}; END{ print max, maxline}'

# print the number of fields in each line, followed by the line
awk '{ print NF ":" $0 } '

# print the last field of each line
awk '{ print $NF }'

# print the last field of the last line
awk '{ field = $NF }; END{ print field }'

# print every line with more than 4 fields
awk 'NF > 4'

# print every line where the value of the last field is > 4
awk '$NF > 4'

STRING CREATION:

# create a string of a specific length (e.g., generate 513 spaces)
awk 'BEGIN{while (a++<513) s=s " "; print s}'

# insert a string of specific length at a certain character position
# Example: insert 49 spaces after column #6 of each input line.
gawk --re-interval 'BEGIN{while(a++<49)s=s " "};{sub(/^.{6}/,"&" s)};1'

ARRAY CREATION:

# These next 2 entries are not one-line scripts, but the technique
# is so handy that it merits inclusion here.

# create an array named "month", indexed by numbers, so that month[1]
# is 'Jan', month[2] is 'Feb', month[3] is 'Mar' and so on.
split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

# create an array named "mdigit", indexed by strings, so that
# mdigit["Jan"] is 1, mdigit["Feb"] is 2, etc. Requires "month" array
for (i=1; i<=12; i++) mdigit[month[i]] = i

TEXT CONVERSION AND SUBSTITUTION:

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
awk '{sub(/\r$/,"")};1' # assumes EACH line ends with Ctrl-M

# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk '{sub(/$/,"\r")};1'

# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format
awk 1

# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format
# Cannot be done with DOS versions of awk, other than gawk:
gawk -v BINMODE="w" '1' infile >outfile

# Use "tr" instead.
tr -d \r <infile >outfile # GNU tr version 1.22 or higher

# delete leading whitespace (spaces, tabs) from front of each line
# aligns all text flush left
awk '{sub(/^[ \t]+/, "")};1'

# delete trailing whitespace (spaces, tabs) from end of each line
awk '{sub(/[ \t]+$/, "")};1'

# delete BOTH leading and trailing whitespace from each line
awk '{gsub(/^[ \t]+|[ \t]+$/,"")};1'
awk '{$1=$1};1' # also removes extra space between fields

# insert 5 blank spaces at beginning of each line (make page offset)
awk '{sub(/^/, " ")};1'

# align all text flush right on a 79-column width
awk '{printf "%79s\n", $0}' file*

# center all text on a 79-character width
awk '{l=length();s=int((79-l)/2); printf "%"(s+l)"s\n",$0}' file*

# substitute (find and replace) "foo" with "bar" on each line
awk '{sub(/foo/,"bar")}; 1' # replace only 1st instance
gawk '{$0=gensub(/foo/,"bar",4)}; 1' # replace only 4th instance
awk '{gsub(/foo/,"bar")}; 1' # replace ALL instances in a line

# substitute "foo" with "bar" ONLY for lines which contain "baz"
awk '/baz/{gsub(/foo/, "bar")}; 1'

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
awk '!/baz/{gsub(/foo/, "bar")}; 1'

# change "scarlet" or "ruby" or "puce" to "red"
awk '{gsub(/scarlet|ruby|puce/, "red")}; 1'

# reverse order of lines (emulates "tac")
awk '{a[i++]=$0} END {for (j=i-1; j>=0;) print a[j--] }' file*

# if a line ends with a backslash, append the next line to it (fails if
# there are multiple lines ending with backslash...)
awk '/\\$/ {sub(/\\$/,""); getline t; print $0 t; next}; 1' file*

# print and sort the login names of all users
awk -F ":" '{print $1 | "sort" }' /etc/passwd

# print the first 2 fields, in opposite order, of every line
awk '{print $2, $1}' file

# switch the first 2 fields of every line
awk '{temp = $1; $1 = $2; $2 = temp}' file

# print every line, deleting the second field of that line
awk '{ $2 = ""; print }'

# print in reverse order the fields of every line
awk '{for (i=NF; i>0; i--) printf("%s ",$i);print ""}' file

# concatenate every 5 lines of input, using a comma separator
# between fields
awk 'ORS=NR%5?",":"\n"' file

SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
awk 'NR < 11'

# print first line of file (emulates "head -1")
awk 'NR>1{exit};1'

# print the last 2 lines of a file (emulates "tail -2")
awk '{y=x "\n" $0; x=$0};END{print y}'

# print the last line of a file (emulates "tail -1")
awk 'END{print}'

# print only lines which match regular expression (emulates "grep")
awk '/regex/'

# print only lines which do NOT match regex (emulates "grep -v")
awk '!/regex/'

# print any line where field #5 is equal to "abc123"
awk '$5 == "abc123"'

# print only those lines where field #5 is NOT equal to "abc123"
# This will also print lines which have less than 5 fields.
awk '$5 != "abc123"'
awk '!($5 == "abc123")'

# matching a field against a regular expression
awk '$7 ~ /^[a-f]/' # print line if field #7 matches regex
awk '$7 !~ /^[a-f]/' # print line if field #7 does NOT match regex

# print the line immediately before a regex, but not the line
# containing the regex
awk '/regex/{print x};{x=$0}'
awk '/regex/{print (NR==1 ? "match on line 1" : x)};{x=$0}'

# print the line immediately after a regex, but not the line
# containing the regex
awk '/regex/{getline;print}'

# grep for AAA and BBB and CCC (in any order on the same line)
awk '/AAA/ && /BBB/ && /CCC/'

# grep for AAA and BBB and CCC (in that order)
awk '/AAA.*BBB.*CCC/'

# print only lines of 65 characters or longer
awk 'length > 64'

# print only lines of less than 65 characters
awk 'length < 64'

# print section of file from regular expression to end of file
awk '/regex/,0'
awk '/regex/,EOF'

# print section of file based on line numbers (lines 8-12, inclusive)
awk 'NR==8,NR==12'

# print line number 52
awk 'NR==52'
awk 'NR==52 {print;exit}' # more efficient on large files

# print section of file between two regular expressions (inclusive)
awk '/Iowa/,/Montana/' # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

# delete ALL blank lines from a file (same as "grep '.' ")
awk NF
awk '/./'

# remove duplicate, consecutive lines (emulates "uniq")
awk 'a !~ $0; {a=$0}'

# remove duplicate, nonconsecutive lines
awk '!a[$0]++' # most concise script
awk '!($0 in a){a[$0];print}' # most efficient script

CREDITS AND THANKS:

Special thanks to the late Peter S. Tillier (U.K.) for helping me with
the first release of this FAQ file, and to Daniel Jana, Yisu Dong, and
others for their suggestions and corrections.

For additional syntax instructions, including the way to apply editing
commands from a disk file instead of the command line, consult:

"sed & awk, 2nd Edition," by Dale Dougherty and Arnold Robbins
(O'Reilly, 1997)

"UNIX Text Processing," by Dale Dougherty and Tim O'Reilly (Hayden
Books, 1987)

"GAWK: Effective awk Programming," 3d edition, by Arnold D. Robbins
(O'Reilly, 2003) or at http://www.gnu.org/software/gawk/manual/

To fully exploit the power of awk, one must understand "regular
expressions." For detailed discussion of regular expressions, see
"Mastering Regular Expressions, 3d edition" by Jeffrey Friedl (O'Reilly,
2006).

The info and manual ("man") pages on Unix systems may be helpful (try
"man awk", "man nawk", "man gawk", "man regexp", or the section on
regular expressions in "man ed").

USE OF '\t' IN awk SCRIPTS: For clarity in documentation, I have used
'\t' to indicate a tab character (0x09) in the scripts. All versions of
awk should recognize this abbreviation.

#---end of file---

SED One Liners

-------------------------------------------------------------------------
USEFUL ONE-LINE SCRIPTS FOR SED (Unix stream editor) Dec. 29, 2005
Compiled by Eric Pement - pemente[at]northpark[dot]edu version 5.5

Latest version of this file (in English) is usually at:
http://sed.sourceforge.net/sed1line.txt
http://www.pement.org/sed/sed1line.txt

This file will also available in other languages:
Chinese - http://sed.sourceforge.net/sed1line_zh-CN.html
Czech - http://sed.sourceforge.net/sed1line_cz.html
Dutch - http://sed.sourceforge.net/sed1line_nl.html
French - http://sed.sourceforge.net/sed1line_fr.html
German - http://sed.sourceforge.net/sed1line_de.html
Italian - (pending)
Portuguese - http://sed.sourceforge.net/sed1line_pt-BR.html
Spanish - (pending)

FILE SPACING:

# double space a file
sed G

# double space a file which already has blank lines in it. Output file
# should contain no more than one blank line between lines of text.
sed '/^$/d;G'

# triple space a file
sed 'G;G'

# undo double-spacing (assumes even-numbered lines are always blank)
sed 'n;d'

# insert a blank line above every line which matches "regex"
sed '/regex/{x;p;x;}'

# insert a blank line below every line which matches "regex"
sed '/regex/G'

# insert a blank line above and below every line which matches "regex"
sed '/regex/{x;p;x;G;}'

NUMBERING:

# number each line of a file (simple left alignment). Using a tab (see
# note on '\t' at end of file) instead of space will preserve margins.
sed = filename | sed 'N;s/\n/\t/'

# number each line of a file (number on left, right-aligned)
sed = filename | sed 'N; s/^/ /; s/ *$.\{6,\}$\n/\1 /'

# number each line of file, but only print numbers if line is not blank
sed '/./=' filename | sed '/./N; s/\n/ /'

# count lines (emulates "wc -l")
sed -n '$='

TEXT CONVERSION AND SUBSTITUTION:

# IN UNIX ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format.
sed 's/.$//' # assumes that all lines end with CR/LF
sed 's/^M$//' # in bash/tcsh, press Ctrl-V then Ctrl-M
sed 's/\x0D$//' # works on ssed, gsed 3.02.80 or higher

# IN UNIX ENVIRONMENT: convert Unix newlines (LF) to DOS format.
sed "s/$/`echo -e \\\r`/" # command line under ksh
sed 's/$'"/`echo \\\r`/" # command line under bash
sed "s/$/`echo \\\r`/" # command line under zsh
sed 's/$/\r/' # gsed 3.02.80 or higher

# IN DOS ENVIRONMENT: convert Unix newlines (LF) to DOS format.
sed "s/$//" # method 1
sed -n p # method 2

# IN DOS ENVIRONMENT: convert DOS newlines (CR/LF) to Unix format.
# Can only be done with UnxUtils sed, version 4.0.7 or higher. The
# UnxUtils version can be identified by the custom "--text" switch
# which appears when you use the "--help" switch. Otherwise, changing
# DOS newlines to Unix newlines cannot be done with sed in a DOS
# environment. Use "tr" instead.
sed "s/\r//" infile >outfile # UnxUtils sed v4.0.7 or higher
tr -d \r <infile >outfile # GNU tr version 1.22 or higher

# delete leading whitespace (spaces, tabs) from front of each line
# aligns all text flush left
sed 's/^[ \t]*//' # see note on '\t' at end of file

# delete trailing whitespace (spaces, tabs) from end of each line
sed 's/[ \t]*$//' # see note on '\t' at end of file

# delete BOTH leading and trailing whitespace from each line
sed 's/^[ \t]*//;s/[ \t]*$//'

# insert 5 blank spaces at beginning of each line (make page offset)
sed 's/^/ /'

# align all text flush right on a 79-column width
sed -e :a -e 's/^.\{1,78\}$/ &/;ta' # set at 78 plus 1 space

# center all text in the middle of 79-column width. In method 1,
# spaces at the beginning of the line are significant, and trailing
# spaces are appended at the end of the line. In method 2, spaces at
# the beginning of the line are discarded in centering the line, and
# no trailing spaces appear at the end of lines.
sed -e :a -e 's/^.\{1,77\}$/ & /;ta' # method 1
sed -e :a -e 's/^.\{1,77\}$/ &/;ta' -e 's/$ *$\1/\1/' # method 2

# substitute (find and replace) "foo" with "bar" on each line
sed 's/foo/bar/' # replaces only 1st instance in a line
sed 's/foo/bar/4' # replaces only 4th instance in a line
sed 's/foo/bar/g' # replaces ALL instances in a line
sed 's/$.*$foo$.*foo$/\1bar\2/' # replace the next-to-last case
sed 's/$.*$foo/\1bar/' # replace only the last case

# substitute "foo" with "bar" ONLY for lines which contain "baz"
sed '/baz/s/foo/bar/g'

# substitute "foo" with "bar" EXCEPT for lines which contain "baz"
sed '/baz/!s/foo/bar/g'

# change "scarlet" or "ruby" or "puce" to "red"
sed 's/scarlet/red/g;s/ruby/red/g;s/puce/red/g' # most seds
gsed 's/scarlet\|ruby\|puce/red/g' # GNU sed only

# reverse order of lines (emulates "tac")
# bug/feature in HHsed v1.5 causes blank lines to be deleted
sed '1!G;h;$!d' # method 1
sed -n '1!G;h;$p' # method 2

# reverse each character on the line (emulates "rev")
sed '/\n/!G;s/$.$$.*\n$/&\2\1/;//D;s/.//'

# join pairs of lines side-by-side (like "paste")
sed '$!N;s/\n/ /'

# if a line ends with a backslash, append the next line to it
sed -e :a -e '/\\$/N; s/\\\n//; ta'

# if a line begins with an equal sign, append it to the previous line
# and replace the "=" with a single space
sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'

# add commas to numeric strings, changing "1234567" to "1,234,567"
gsed ':a;s/\B[0-9]\{3\}\>/,&/;ta' # GNU sed
sed -e :a -e 's/$.*[0-9]$$[0-9]\{3\}$/\1,\2/;ta' # other seds

# add commas to numbers with decimal points and minus signs (GNU sed)
gsed -r ':a;s/(^|[^0-9.])([0-9]+)([0-9]{3})/\1\2,\3/g;ta'

# add a blank line every 5 lines (after lines 5, 10, 15, 20, etc.)
gsed '0~5G' # GNU sed only
sed 'n;n;n;n;G;' # other seds

SELECTIVE PRINTING OF CERTAIN LINES:

# print first 10 lines of file (emulates behavior of "head")
sed 10q

# print first line of file (emulates "head -1")
sed q

# print the last 10 lines of a file (emulates "tail")
sed -e :a -e '$q;N;11,$D;ba'

# print the last 2 lines of a file (emulates "tail -2")
sed '$!N;$!D'

# print the last line of a file (emulates "tail -1")
sed '$!d' # method 1
sed -n '$p' # method 2

# print the next-to-the-last line of a file
sed -e '$!{h;d;}' -e x # for 1-line files, print blank line
sed -e '1{$q;}' -e '$!{h;d;}' -e x # for 1-line files, print the line
sed -e '1{$d;}' -e '$!{h;d;}' -e x # for 1-line files, print nothing

# print only lines which match regular expression (emulates "grep")
sed -n '/regexp/p' # method 1
sed '/regexp/!d' # method 2

# print only lines which do NOT match regexp (emulates "grep -v")
sed -n '/regexp/!p' # method 1, corresponds to above
sed '/regexp/d' # method 2, simpler syntax

# print the line immediately before a regexp, but not the line
# containing the regexp
sed -n '/regexp/{g;1!p;};h'

# print the line immediately after a regexp, but not the line
# containing the regexp
sed -n '/regexp/{n;p;}'

# print 1 line of context before and after regexp, with line number
# indicating where the regexp occurred (similar to "grep -A1 -B1")
sed -n -e '/regexp/{=;x;1!p;g;$!N;p;D;}' -e h

# grep for AAA and BBB and CCC (in any order)
sed '/AAA/!d; /BBB/!d; /CCC/!d'

# grep for AAA and BBB and CCC (in that order)
sed '/AAA.*BBB.*CCC/!d'

# grep for AAA or BBB or CCC (emulates "egrep")
sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d # most seds
gsed '/AAA\|BBB\|CCC/!d' # GNU sed only

# print paragraph if it contains AAA (blank lines separate paragraphs)
# HHsed v1.5 must insert a 'G;' after 'x;' in the next 3 scripts below
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;'

# print paragraph if it contains AAA and BBB and CCC (in any order)
sed -e '/./{H;$!d;}' -e 'x;/AAA/!d;/BBB/!d;/CCC/!d'

# print paragraph if it contains AAA or BBB or CCC
sed -e '/./{H;$!d;}' -e 'x;/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d
gsed '/./{H;$!d;};x;/AAA\|BBB\|CCC/b;d' # GNU sed only

# print only lines of 65 characters or longer
sed -n '/^.\{65\}/p'

# print only lines of less than 65 characters
sed -n '/^.\{65\}/!p' # method 1, corresponds to above
sed '/^.\{65\}/d' # method 2, simpler syntax

# print section of file from regular expression to end of file
sed -n '/regexp/,$p'

# print section of file based on line numbers (lines 8-12, inclusive)
sed -n '8,12p' # method 1
sed '8,12!d' # method 2

# print line number 52
sed -n '52p' # method 1
sed '52!d' # method 2
sed '52q;d' # method 3, efficient on large files

# beginning at line 3, print every 7th line
gsed -n '3~7p' # GNU sed only
sed -n '3,${p;n;n;n;n;n;n;}' # other seds

# print section of file between two regular expressions (inclusive)
sed -n '/Iowa/,/Montana/p' # case sensitive

SELECTIVE DELETION OF CERTAIN LINES:

# print all of file EXCEPT section between 2 regular expressions
sed '/Iowa/,/Montana/d'

# delete duplicate, consecutive lines from a file (emulates "uniq").
# First line in a set of duplicate lines is kept, rest are deleted.
sed '$!N; /^$.*$\n\1$/!P; D'

# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^$[ -~]*\n$.*\n\1/d; s/\n//; h; P'

# delete all lines except duplicate lines (emulates "uniq -d").
sed '$!N; s/^$.*$\n\1$/\1/; t; D'

# delete the first 10 lines of a file
sed '1,10d'

# delete the last line of a file
sed '$d'

# delete the last 2 lines of a file
sed 'N;$!P;$!D;$d'

# delete the last 10 lines of a file
sed -e :a -e '$d;N;2,10ba' -e 'P;D' # method 1
sed -n -e :a -e '1,10!{P;N;D;};N;ba' # method 2

# delete every 8th line
gsed '0~8d' # GNU sed only
sed 'n;n;n;n;n;n;n;d;' # other seds

# delete lines matching pattern
sed '/pattern/d'

# delete ALL blank lines from a file (same as "grep '.' ")
sed '/^$/d' # method 1
sed '/./!d' # method 2

# delete all CONSECUTIVE blank lines from file except the first; also
# deletes all blank lines from top and end of file (emulates "cat -s")
sed '/./,/^$/!d' # method 1, allows 0 blanks at top, 1 at EOF
sed '/^$/N;/\n$/D' # method 2, allows 1 blank at top, 0 at EOF

# delete all CONSECUTIVE blank lines from file except the first 2:
sed '/^$/N;/\n$/N;//D'

# delete all leading blank lines at top of file
sed '/./,$!d'

# delete all trailing blank lines at end of file
sed -e :a -e '/^\n*$/{$d;N;ba' -e '}' # works on all seds
sed -e :a -e '/^\n*$/N;/\n$/ba' # ditto, except for gsed 3.02.*

# delete the last line of each paragraph
sed -n '/^$/{p;h;};/./{x;/./p;}'

SPECIAL APPLICATIONS:

# remove nroff overstrikes (char, backspace) from man pages. The 'echo'
# command may need an -e switch if you use Unix System V or bash shell.
sed "s/.`echo \\\b`//g" # double quotes required for Unix environment
sed 's/.^H//g' # in bash/tcsh, press Ctrl-V and then Ctrl-H
sed 's/.\x08//g' # hex expression for sed 1.5, GNU sed, ssed

# get Usenet/e-mail message header
sed '/^$/q' # deletes everything after first blank line

# get Usenet/e-mail message body
sed '1,/^$/d' # deletes everything up to first blank line

# get Subject header, but remove initial "Subject: " portion
sed '/^Subject: */!d; s///;q'

# get return address header
sed '/^Reply-To:/q; /^From:/h; /./d;g;q'

# parse out the address proper. Pulls out the e-mail address by itself
# from the 1-line return address header (see preceding script)
sed 's/ *(.*)//; s/>.*//; s/.*[:<] *//'

# add a leading angle bracket and space to each line (quote a message)
sed 's/^/> /'

# delete leading angle bracket & space from each line (unquote a message)
sed 's/^> //'

# remove most HTML tags (accommodates multiple-line tags)
sed -e :a -e 's/<[^>]*>//g;/</N;//ba'

# extract multi-part uuencoded binaries, removing extraneous header
# info, so that only the uuencoded portion remains. Files passed to
# sed must be passed in the proper order. Version 1 can be entered
# from the command line; version 2 can be made into an executable
# Unix shell script. (Modified from a script by Rahul Dhesi.)
sed '/^end/,/^begin/d' file1 file2 ... fileX | uudecode # vers. 1
sed '/^end/,/^begin/d' "$@" | uudecode # vers. 2

# sort paragraphs of file alphabetically. Paragraphs are separated by blank
# lines. GNU sed uses \v for vertical tab, or any unique char will do.
sed '/./{H;d;};x;s/\n/={NL}=/g' file | sort | sed '1s/={NL}=//;s/={NL}=/\n/g'
gsed '/./{H;d};x;y/\n/\v/' file | sort | sed '1s/\v//;y/\v/\n/'

# zip up each .TXT file individually, deleting the source file and
# setting the name of each .ZIP file to the basename of the .TXT file
# (under DOS: the "dir /b" switch returns bare filenames in all caps).
echo @echo off >zipup.bat
dir /b *.txt | sed "s/^$.*$\.TXT/pkzip -mo \1 \1.TXT/" >>zipup.bat

TYPICAL USE: Sed takes one or more editing commands and applies all of
them, in sequence, to each line of input. After all the commands have
been applied to the first input line, that line is output and a second
input line is taken for processing, and the cycle repeats. The
preceding examples assume that input comes from the standard input
device (i.e, the console, normally this will be piped input). One or
more filenames can be appended to the command line if the input does
not come from stdin. Output is sent to stdout (the screen). Thus:

cat filename | sed '10q' # uses piped input
sed '10q' filename # same effect, avoids a useless "cat"
sed '10q' filename > newfile # redirects output to disk

For additional syntax instructions, including the way to apply editing
commands from a disk file instead of the command line, consult "sed &
awk, 2nd Edition," by Dale Dougherty and Arnold Robbins (O'Reilly,
1997; http://www.ora.com), "UNIX Text Processing," by Dale Dougherty
and Tim O'Reilly (Hayden Books, 1987) or the tutorials by Mike Arst
distributed in U-SEDIT2.ZIP (many sites). To fully exploit the power
of sed, one must understand "regular expressions." For this, see
"Mastering Regular Expressions" by Jeffrey Friedl (O'Reilly, 1997).
The manual ("man") pages on Unix systems may be helpful (try "man
sed", "man regexp", or the subsection on regular expressions in "man
ed"), but man pages are notoriously difficult. They are not written to
teach sed use or regexps to first-time users, but as a reference text
for those already acquainted with these tools.

QUOTING SYNTAX: The preceding examples use single quotes ('...')
instead of double quotes ("...") to enclose editing commands, since
sed is typically used on a Unix platform. Single quotes prevent the
Unix shell from intrepreting the dollar sign ($) and backquotes
(`...`), which are expanded by the shell if they are enclosed in
double quotes. Users of the "csh" shell and derivatives will also need
to quote the exclamation mark (!) with the backslash (i.e., \!) to
properly run the examples listed above, even within single quotes.
Versions of sed written for DOS invariably require double quotes
("...") instead of single quotes to enclose editing commands.

USE OF '\t' IN SED SCRIPTS: For clarity in documentation, we have used
the expression '\t' to indicate a tab character (0x09) in the scripts.
However, most versions of sed do not recognize the '\t' abbreviation,
so when typing these scripts from the command line, you should press
the TAB key instead. '\t' is supported as a regular expression
metacharacter in awk, perl, and HHsed, sedmod, and GNU sed v3.02.80.

VERSIONS OF SED: Versions of sed do differ, and some slight syntax
variation is to be expected. In particular, most do not support the
use of labels (:name) or branch instructions (b,t) within editing
commands, except at the end of those commands. We have used the syntax
which will be portable to most users of sed, even though the popular
GNU versions of sed allow a more succinct syntax. When the reader sees
a fairly long command such as this:

sed -e '/AAA/b' -e '/BBB/b' -e '/CCC/b' -e d

it is heartening to know that GNU sed will let you reduce it to:

sed '/AAA/b;/BBB/b;/CCC/b;d' # or even
sed '/AAA\|BBB\|CCC/b;d'

In addition, remember that while many versions of sed accept a command
like "/one/ s/RE1/RE2/", some do NOT allow "/one/! s/RE1/RE2/", which
contains space before the 's'. Omit the space when typing the command.

OPTIMIZING FOR SPEED: If execution speed needs to be increased (due to
large input files or slow processors or hard disks), substitution will
be executed more quickly if the "find" expression is specified before
giving the "s/.../.../" instruction. Thus:

sed 's/foo/bar/g' filename # standard replace command
sed '/foo/ s/foo/bar/g' filename # executes more quickly
sed '/foo/ s//bar/g' filename # shorthand sed syntax

On line selection or deletion in which you only need to output lines
from the first part of the file, a "quit" command (q) in the script
will drastically reduce processing time for large files. Thus:

sed -n '45,50p' filename # print line nos. 45-50 of a file
sed -n '51q;45,50p' filename # same, but executes much faster

If you have any additional scripts to contribute or if you find errors
in this document, please send e-mail to the compiler. Indicate the
version of sed you used, the operating system it was compiled for, and
the nature of the problem. To qualify as a one-liner, the command line
must be 65 characters or less. Various scripts in this file have been
written or contributed by:

Al Aab # founder of "seders" list
Edgar Allen # various
Yiorgos Adamopoulos # various
Dale Dougherty # author of "sed & awk"
Carlos Duarte # author of "do it with sed"
Eric Pement # author of this document
Ken Pizzini # author of GNU sed v3.02
S.G. Ravenhall # great de-html script
Greg Ubben # many contributions & much help
-------------------------------------------------------------------------