Tuesday 9 June 2015

Top 10 Myths about Hadoop

Apache Hadoop has been considered as one of the newer and one of the best technologies designed to extract meaning out of "Big Data".
Hadoop was created by Doug Cutting, who named it after his son's toy elephant. It is a library of open source software used to create a distributed computing environment.
Although it has been a long time since Hadoop came into existent, many people still have misconceptions that need to be corrected.

Let's look at some of the most common myths about Hadoop and big data that companies should know before committing to a Hadoop project.

Myth-1: Hadoop is single product. 

Fact: Hadoop consists of multiple products.


People have an assumption that Hadoop is a singular product, but it is actually made up of multiple products.
"Hadoop is the brand name of a family of open source products; those products are incubated and administered by Apache software."

The Apache Hadoop library includes:


  • The Hadoop Distributed File System (HDFS)
  • MapReduce
  • Pig
  • Hive
  • HBase
  • HCatalog
  • Mahout and so on.

When people typically think of Hadoop, they think of its Hadoop Distributed File System (HDFS), which is a foundation for other products, similar to MapReduce.

Myth-2: Hadoop is only about data volume. 
Fact: Hadoop is also about data diversity, not just data volume. 


Some people think Hadoop as technology designed for high volumes of data, but Hadoop's real value is its power to handle diverse data.

Theoretically, HDFS can manage the storage and access of any data type as long as you can put the data in a file and copy that file into HDFS. But the main advantage of Hadoop is the ability to analyze and extract meaningful information from the huge data volume.


Myth-3: All the components of Hadoop are open source only. 
Fact: Hadoop is open source but available from proprietary vendors too.


Apache Hadoop's open-source software library is available from Apache Software Foundation and can be downloaded for free from apache.org.
But vendors like IBM, Cloudera and EMC Greenplum have also made Hadoop available through special distribution.

Those distributions tend to come with added features such as administrative tools not offered by Apache Hadoop as well as support and maintenance.
A handful of vendors also offer their own non-Hadoop-based implementations of MapReduce.


Myth-4: Hadoop is the only answer to "Big Data". 
Fact: Big data does not always require Hadoop. 


Big Data and Hadoop have become synonyms but Hadoop is not the only answer to Big Data.
In fact some companies were working on Big Data even before Hadoop exists. There are other for Big Data too like Teradata, Vertica etc.


Myth-5: HDFS is the database management system of Hadoop. 
Fact: HDFS is a file system, not a database management system (DBMS). 


Hadoop is mainly a distributed file system and it does not have the capabilities of database management system (DBMS) such as indexing, random access to data, support for standard SQL, and query optimization.
To get minimal DBMS functionality, we can use HBase and Hive on top of HDFS.


Myth-6: Hive is nothing but SQL based tool. 
Fact: Hive resembles SQL but is not standard SQL. 


Hive is SQL-based tool. This means that people who are efficient in SQL can quickly learn Hive. But this does not solve the compatibility issues with SQL-based tools.
In future, Hadoop might support standard SQL and SQL-based tools. But currently it does not.


Myth-7: MapReduce is an integral part of HDFS. 
Fact: HDFS and MapReduce are related but don't require each other. 


MapReduce was developed by Google before HDFS existed. Although HDFS and MapReduce is a good combination, both can be used independently.
Some vendors such as MapR are creating variations of MapReduce that do not need HDFS. Some users deploy HDFS with Hive or HBase, but not MapReduce.


Myth-8: Hadoop is mainly used by Internet companies for analyzing Web logs and other Web data.
Fact: Hadoop enables many types of analytics, not just Web analytics. 


We get lot of news about how Internet companies use it for analyzing Web logs and other Web data, but other use cases exist.

Railroad companies are, for example, using sensors to detect unusually high temperatures on rail cars, which can signal an impeding failure.
Older analytic applications that need large data samples - such as customer base segmentation, fraud detection, and risk analysis - can benefit from the additional big data managed by Hadoop.


Myth-9: Hadoop is totally Free. 
Fact: Hadoop is open source but there is also deploying cost involved. 


People think that Hadoop is open source so there is no cost involved at all. This is not true.

The lack of features such as administrative tools and support can create additional costs.
There is also the hardware cost of a Hadoop cluster or the real estate and the power it takes to make that cluster operational.


Myth-10: Hadoop is an alternative/replacement of Data Warehouse. 
Fact: Hadoop helps a Data Warehouse. It is not a replacement. 


Data warehouses still do the work and Hadoop actually complement the data warehouse by becoming "an edge system".
Many Data Warehouses were designed only for structured, relational data which makes it difficult to use unstructured data. Hadoop becomes a helping hand in this case.

Happy Learning :) ...

MapReduce - The Heart of Hadoop

In this article, we will learn:

  1. What is MapReduce
  2. Few interesting facts about MapReduce
  3. MapReduce component and architecture
  4. How MapReduce works in Hadoop


MapReduce:

MapReduce is a programming model which is used to process large data sets in a batch processing manner.
A MapReduce program is composed of

  • a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name)
  • and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).


Few Important Facts about MapReduce:


  • Apache Hadoop Map-Reduce is an open source implementation of Google's Map Reduce Framework.
  • Although there are so many map-reduce implementation like Dryad from Microsoft, Dicso from Nokia which have been developed for distributed systems but Hadoop being the most popular among them offering open source implementation of Map-reduce framework.
  • Hadoop Map-Reduce framework works on Master/Slave architecture.


MapReduce Architecture:



Hadoop 1.x MapReduce is composed of two components.

  1. Job tracker playing the role of master and runs on MasterNode (Namenode)
  2. Task tracker playing the role of slave per data node and runs on Datanodes

Job Tracker:



  1. Job Tracker is the one to which client application submit mapreduce programs(jobs).
  2. Job Tracker schedule clients jobs and allocates task to the slave task trackers that are running on individual worker machines(date nodes).
  3. Job tracker manage overall execution of Map-Reduce job.
  4. Job tracker manages the resources of the cluster like:
    • Manage the data nodes i.e. task tracker.
    • To keep track of the consumed and available resource.
    • To keep track of already running task, to provide fault-tolerance for task etc.

Task Tracker:


  1. Each Task Tracker is responsible to execute and manage the individual tasks assigned by Job Tracker.
  2. Task Tracker also handles the data motion between the map and reduce phases.
  3. One Prime responsibility of Task Tracker is to constantly communicate with the Job Tracker the status of the Task.
  4. If the JobTracker fails to receive a heartbeat from a TaskTracker within a specified amount of time, it will assume the TaskTracker has crashed and will resubmit the corresponding tasks to other nodes in the cluster.

How MapReduce Engine Works:

The Let us understand how exactly map reduce program gets executed in Hadoop. What is the relationship between different entities involved in this whole process. 

The entire process can be listed as follows:

  1. Client applications submit jobs to the JobTracker.
  2. The JobTracker talks to the NameNode to determine the location of the data
  3. The JobTracker locates TaskTracker nodes with available slots at or near the data
  4. The JobTracker submits the work to the chosen TaskTracker nodes.
  5. The TaskTracker nodes are monitored. If they do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
  6. A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may may even blacklist the TaskTracker as unreliable.
  7. When the work is completed, the JobTracker updates its status.
  8. Client applications can poll the JobTracker for information.

Let us see these steps in more details.

1. Client submits MapReduce job to Job Tracker: 

Whenever client/user submit map-reduce jobs, it goes straightaway to Job tracker. Client program contains all information like the map, combine and reduce function, input and output path of the data. 



2. Job Tracker Manage and Control Job: 

  • The JobTracker puts the job in a queue of pending jobs and then executes them on a FCFS(first come first serve) basis.
  • The Job Tracker first determine the number of split from the input path and assign different map and reduce tasks to each TaskTracker in the cluster. There will be one map task for each split.
  • Job tracker talks to the NameNode to determine the location of the data i.e. to determine the datanode which contains the data.




3. Task Assignment to Task Tracker by Job Tracker: 

  • The task tracker is pre-configured with a number of slots which indicates that how many task(in number) Task Tracker can accept. For example, a TaskTracker may be able to run two map tasks and two reduce tasks simultaneously.
  • When the job tracker tries to schedule a task, it looks for an empty slot in the TaskTracker running on the same server which hosts the datanode where the data for that task resides. If not found, it looks for the machine in the same rack. There is no consideration of system load during this allocation.



4. Task Execution by Task Tracker: 

  • Now when the Task is assigned to Task Tracker, Task tracker creates local environment to run the Task.
  • Task Tracker need the resources to run the job. Hence it copies any files needed from the distributed cache by the application to the local disk, localize all the job Jars by copying it from shared File system to Task Tracker's file system.
  • Task Tracker can also spawn multiple JVMs to handle many map or reduce tasks in parallel.
  • TaskTracker actually initiates the Map or Reduce tasks and reports progress back to the JobTracker.




5. Send notification to Job Tracker: 

  • When all the map tasks are done by different task tracker they will notify the Job Tracker. Job Tracker then ask the selected Task Trackers to do the Reduce Phase

6. Task recovery in failover situation: 

  • Although there is single TaskTracker on each node, Task Tracker spawns off a separate Java Virtual Machine process to prevent the TaskTracker itself from failing if the running job(process) crashes the JVM due to some bugs defined in user written map reduce function

7. Monitor Task Tracker : 

  • The TaskTracker nodes are monitored. A heartbeat is sent from the TaskTracker to the JobTracker every few minutes to check its status.
  • If Task Tracker do not submit heartbeat signals often enough, they are deemed to have failed and the work is scheduled on a different TaskTracker.
  • A TaskTracker will notify the JobTracker when a task fails. The JobTracker decides what to do then: it may resubmit the job elsewhere, it may mark that specific record as something to avoid, and it may even blacklist the TaskTracker as unreliable.

8. Job Completion: 

  • When the work is completed, the JobTracker updates its status.
  • Client applications can poll the JobTracker for information.

Wednesday 11 March 2015

Components of Hadoop

The components of a running Hadoop cluster consist of a set of daemons. Some of these run on single server whereas some run across multiple servers. These daemons include:
  1. Namenode
  2. Secondary Namenode
  3. Datanode
  4. Jobtracker
  5. Tasktracker
Namenode:
The Namenode is responsible for managing filesystem namespace. It maintains the filesystem tree and the metadata for all the files and directories on the HDFS cluster. A namespace image file and an edit log file  on the local disk stores this information. The namenode knows the datanodes on which all the blocks for a given file are located, however, it does not store block locations persistently, since this information is reconstructed from datanodes when the system starts. A client accesses the filesystem on behalf of the user by communicating with the Namenode and datanodes.

A Namenode is a single point of failure of the Hadoop cluster. It is therefore necessary to make Namenode fault tolerant. There are two ways of doing this.The first way is to configure Hadoop so that it stores backup of the persistent state of filesystem metadata to multiple filesystem. The second way is using Secondary  Namenode.

Secondary Namenode:
Secondary Namenode periodically merges the namespace image with the edit log and maintain a copy of this namespace image. It usually runs on a seperate machine. However the Secondary Namenode lags in state with the primary Namenode, hence in case of failure of primary Namenode some data loss occurs for sure.

Datanode:
The Datanodes act as the work horses of the filesystem. They store and retrieve blocks when requested by clients or the namenode. A Datanode reports the Namenode with the lists of blocks that are stored on it.

All the above daemons are called as storage daemons, since they handle operations related to storage of files on HDFS.The storage daemons follow the master-slave architecture with the Namenode acting as master and Datanodes acting as slaves. Now we'll see compute daemons. They also follow master-slave architecture with Jobtracker acting as master and Tasktrackers acting as slaves.

Jobtracker:
A Jobtracker coordinates all the jobs that are run on the system by scheduling each task to run on tasktrackers. It is the responsibility of Jobtracker to reschedule a failed task on a different Tasktracker.

Tasktracker:
Tasktrackers run tasks allocated to them and send progress reports to the jobtracker, that keeps a record of the overall progress of each job.

The diagram below shows the topology of a Hadoop cluster:




Monday 2 March 2015

Usage of hashcode() and equals() methods in java

In this post ,we will try to understand hashcode() and equals() method in java.
These methods can be found in the Object class and hence available to all java classes.Using these two methods, an object can be stored or retrieved from a Hashtable, HashMap or HashSet.
  • hashcode()
  • equals()

Usage of hashcode() and equals()

hashcode():
This method is used to get unique integer for given object. This integer is used to find bucket when storing in hashmap or hashset. By default, This method returns integer representation of memory address where object is stored.

equals():
This method is used to simply check the equality between two objects. By default, it checks where two reference refer to same object or not(==).

Lets override default implementation of hashcode() and equals():

If you don't override these method, everything will work fine but sometimes there is a need to override these method e.g. you want to define equality between two employee object as true if Both have same emailid.
Lets see with the help of example.We have a class called Emp
1. Emp.java
 
             package com.csamples;

public class Emp {
    int eid;
    String ename, email;
    public int getEid() {
        return eid;
    }
    public void setEid(int eid) {
        this.eid = eid;
    }
    public String getEname() {
        return ename;
    }
    public void setEname(String ename) {
        this.ename = ename;
    }
    public String getEmail() {
        return email;
    }
    public void setEmail(String email) {
        this.email = email;
    }
  
}

This Emp class have three basic attributes- eid, ename and email.
Now create a class called "EqualityCheckMain.java"
package com.csamples;
/**
 * @author skummitha
 *
 */
public class EqualityCheckMain {

    public static void main(String[] args) {
        
        Emp emp1=new Emp();
        emp1.setEid(101);
        emp1.setEname("Sreenu");
        emp1.setEmail("sreenu.vas2004@gmail.com");
        Emp emp2=new Emp();
        emp2.setEid(102);
        emp2.setEname("SRK");
        emp2.setEmail("sreenu.vas2004@gmail.com");
        System.out.println("Is emp1 is equal to emp2:" +emp1.equals(emp2));
       }
 }
When you run above program, you will get following output
                  Is emp1 is equal to emp2:false
 
In above program, we have created two different objects and set their email attribute to "sreenu.vas2004@gmail.com".
Because both references emp1 and emp2 are pointing to different object, as default implementation of equals check for ==,equals method is returning false. In real life, it should have return true because no two employees can have same email.
Now lets override equals and return true if two employee's email addresses are same.
Add this method to above Emp class:
    @Override
    public boolean equals(Object obj) {
        boolean flag=false;
           if(obj instanceof Emp)
           {
             Emp e=(Emp)obj;
             if(this.email.equals(e.email))
             {
                flag=true;
             }
           }
           return flag;
        
    }
and now run EqualityCheckMain.java again
You will get following output:
                 Is emp1 is equal to emp2:true
Now this is because overriden equals method return true if two employees have same email.
One thing to remember here, signature of equals method should be same as above.

Lets put this Emp objects in hashmap:

Here we are going to use Emp class object as key and their full name(string) as value in HashMap.
package com.csamples;

import java.util.HashMap; 
import java.util.Iterator; 
 
/**
 * @author skummitha
 *
 */
public class HashMapEqualityCheckMain { 
 

    public static void main(String[] args) { 
        HashMap<Emp,String> empNamelMap=new HashMap<Emp,String>();  
        Emp emp1=new Emp(); 
        emp1.setEid(101);
        emp1.setEname("Sreenu");
        emp1.setEmail("sreenu.vas2004@gmail.com");
        Emp emp2=new Emp(); 
        emp2.setEid(102);
        emp2.setEname("SRK");
        emp2.setEmail("sreenu.vas2004@gmail.com");
 
        empNamelMap.put(emp1, "Sree Reddy"); 
        empNamelMap.put(emp2, "SRK Reddy"); 
 
        Iterator<Emp> empFullNameIter=empNamelMap.keySet().iterator(); 
        while(empFullNameIter.hasNext()) 
        { 
            Emp empObj=empFullNameIter.next(); 
            String empFullName=empNamelMap.get(empObj); 
            System.out.println("Full Name of "+ empObj.getEname()+"----"+empFullName); 
 
        } 
    }  
}
When you run above program, you will see following output:

Full Name of SRK----SRK Reddy
Full Name of Sreenu----Sree Reddy
Now you must be wondering even through two objects are equal why HashMap contains two key value pair instead of one.This is because First HashMap uses hashcode to find bucket for that key object, if hashcodes are same then only it checks for equals method and because hashcode for above two employee objects uses default hashcode method,Both will have different memory address hence different hashcode.
Now lets override hashcode method.Add following method to Emp class
    @Override
    public int hashCode() {
        return  email.hashCode();//(eid + ename+ email).hashCode();
    }
Now run HashMapEqualityCheckMain.java again
You will see following output:
 
           Full Name of Sreenu----SRK Reddy
So now hashcode for above two objects emp1 and emp2 are same, so Both will be point to same bucket,now equals method will be used to compare them which  will return true.
This is the reason java doc says "if you override equals() method then you must override hashCode() method"

hashcode() and equals() contracts: 

equals():

The equals method implements an equivalence relation on non-null object references:
  • It is reflexive: for any non-null reference value x, x.equals(x) should return true.
  • It is symmetric: for any non-null reference values x and y, x.equals(y) should return true if and only if y.equals(x) returns true.
  • It is transitive: for any non-null reference values x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) should return true.
  • It is consistent: for any non-null reference values x and y, multiple invocations of x.equals(y) consistently return true or consistently return false, provided no information used in equals comparisons on the objects is modified.
  • For any non-null reference value x, x.equals(null) should return false.

hashcode():

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.

Key points to remember:

  1. If you are overriding equals method then you should override hashcode() also.
  2. If two objects are equal then they must have same hashcode.
  3. If two objects have same hashcode then they may or may not be equal
  4. Always use same attributes to generate equals and hashcode as in our case we have used name.

Wednesday 25 February 2015

MS DOS Commands



How to hide a file in MS-DOS.

Issue:
How to hide a file in MS-DOS.
Solution:
There are two methods of hiding files in MS-DOS. The first method is by using the attrib command as seen below.
attrib +h c:\autoexec.bat
This command will hide your autoexec.bat file so a standard user browsing your hard disk drive would not be able to see the file.  To make the file unhidden use -h instead of the +h so the line would look like "attrib -h c:\autoexec.bat".
Although the file is hidden, a user could still type edit c:\autoexec.bat and still be able to edit the file, or if the user typed attrib this would list all files with their attributes.
The other method uses ASCII characters when creating the directory or renaming it. Below are the steps required for creating a directory with these characters.
Type md (hold down ALT and type 987 while continuing to hold ALT, once typed in let go of alt and you should get a solid block). Press enter to create the directory. To get into this directory, a user would have to type cd and hold ALT while typing 987 to get the block. When pressing enter you would then be able to get into the directory.
Note: Windows 3.x and Windows 95 will not be able to access these directories, and must be accessed through DOS. However, users running Windows 98 and above have the capability of opening these directories from windows; therefore, if you are using this method for privacy or security, your procedure could be easily bypassed.
I cant remember what characters I typed. How do I delete the directory now?
See our ASCII dictionary definition that lists a complete listing of ASCII characters. Or if you have typed in characters plus the ASCII character, you can use the "?" wildcard where the ASCII character is.





What is an external / internal command?

Question:
What is an external / internal command?
Answer:
In MS-DOS there are two types of commands. An Internal command, which is a command embedded into the command.com file, and an external command, which is not embedded into command.com and therefore requires a separate file to be used.
For example, if your computer does not have fdisk.exe and you try using the fdisk command, you would receive an error "Bad command or file name." Fdisk is an external command which will only work if fdisk.exe, or in some cases, fdisk.com, is present.
However, as long as MS-DOS is running on your computer internal commands such as the cd command will always be available and does not require any other files to run.
Computer Hope's MS-DOS page lists which commands are external and which are internal. In addition, you can see our dictionary internal command page and/or external command page for a complete listing of each of the internal and external commands available.

Deleting files in MS-DOS without a prompt.

Issue:
Deleting files in MS-DOS without a prompt.
Cause:
To help prevent files from becoming accidentally deleted Microsoft will warn you before deleting files or folders.
Solution:
By default Microsoft Windows will not prompt a user or give a user a prompt or warning when deleting files on a computer using the del command, however when attempting to delete a directory using the deltree or rmdir command on a directory that is not empty you will receive a warning and/or error message about deleting the directory.
To suppress the prompting use the deltree command and add the tag /y.  Here is an example of how the whole command would look, deltree c:\windows\temp\*.* /y However, this does not work in all versions of Windows and or DOS.
If this command does not work we would recommend that you create a batch file with the below command in the batch file.
echo y | del %1\*.*
Once created, you can type the name of the batch file then the name of the directory that you wish to delete.
Microsoft Windows 2000 and Windows XP users
Users who wish to delete a directory containing files in a MS-DOS session running under Microsoft Windows 2000 or Windows XP can also use the rmdir or rd command with the /S option.
Please remember that when you delete files or directories from the computer they are permanently removed, so be careful! Microsoft Windows 95, Windows 98, Windows ME, Windows NT, Windows 2000, Windows XP, and later versions of Windows users deleting files through MS-DOS should realize that all deleted files will not be sent to the recycle bin.
Other users using MS-DOS through Windows
Users running MS-DOS through later versions of Microsoft Windows can also utilize the erase command to delete files without a prompt. Microsoft Windows 95, Windows 98, Windows ME, Windows NT, Windows 2000, Windows XP, and later versions of Windows users deleting files through MS-DOS should realize that all deleted files will not be sent to the recycle bin.







How do I scroll in MS-DOS?

Question:
How do I scroll in MS-DOS?
Answer:
Unfortunately this is NOT possible in any version of MS-DOS. Also, if you are using Windows 95, Windows 98, Windows NT, or Windows 2000 and re-size the MS-DOS window, there is also no option for the Scroll bar.
To list files in MS-DOS one page at a time use the the 'pipe' command in conjunction with the more statement. For example:
dir | more
or
attrib *.* | more
In addition, some commands, such as the dir command, also allow the /p command which displays one page at a time. For example:
dir /p



How to resize a MS-DOS maximized window.

Issue:
How to resize a MS-DOS maximized window.
Cause:
Once maximized, unable to re-size the window because it is full screen.
Solution:
Use the shortcut key ALT + ENTER: pressing these two keys simultaneously will resize the window making the window either full screen or in window mode.
Additional Information:
  • Additional information about switching between MS-DOS windows and other windows can be found on document CH000562.
  • If you're looking for information about the command to exit the MS-DOS window see document CH000736.


Can you type more than one command at one command prompt?

Question:
Can you type more than one command at one command prompt?
Answer:
Yes, use the pipe or the ampersand to seperate your commands. The shell and version of Windows you are using decides which character to use. Below are some additional examples for each version of windows and shell. In both of these examples the command would first get to the root of the current drive and then run the dir command to list the current contents of the root.
Microsoft Windows 95, Windows 98 and Windows ME users
cd\ | dir
Microsoft Windows 2000 and Windows XP users using the command shell (typing command in the run line).
Instead of using the pipe Microsoft decided to change it to the ampersand (&) for reasons unknown by us. Therefore you would need to type the below command to have the same results as earlier versions of Windows.
cd\ & dir


Receive error 'Too many parameters'.

Issue:
Receive error 'Too many parameters'.
Cause:
There is one too many spaces in the command you are typing. 
Solution:
Verify that the command you are typing is correct and that you have no additional spaces in the command line.


How to find a file in MS-DOS.

Question:
How to find a file in MS-DOS
Answer:
With MS-DOS it is possible to find any file on your computer, providing you know the name of the file or the program that the file was created in.
If you are unsure where the file may be on the computer, you must be at the root directory of the computer. Meaning, you must be at C:\> to get to this prompt, type cd\
Once at the root directory or the directory you believe the file to be in, type any of the below commands.
If, for example, you knew that the file had bob somewhere in the file you would type:
dir *bob*.* /s
In the above example you will utilize the wild character, which in MS-DOS is the asterisks ( * ). In addition to the asterisks, we utilize the /s that tells the dir command to search the current directory and all directories thereafter.
If you cannot recall any of the names of the files but recall that they were created in Microsoft Excel you could, for example, type:
dir *.xls /s
In the above example, knowing that Excel files generally always end with .xls, we again use the wild character, telling the computer to search for any file ending with .xls. If you do not know what the extension of your file is, you can find a listing of the majority of extensions and the associated program on our MS-DOS Extension page.
Finally, once you have found the file, you must interpret the output of your search and be able to then change directories to get to that file. Below is an example of the results found when typing dir *bob*.* /s in one of the above examples:


How to shut down / restart the computer with a batch file.

Question:
How to shut down / restart the computer with a batch file.
Reasoning:
It may be necessary after a batch file is completed its copying or installing process to restart the computer to complete that installation. Below are steps that can be used to restart a computer through a batch file.
Solution:

MS-DOS Users

If the computer needs to be restarted from MS-DOS please see our debug page for additional information on how to do this.

Windows 95, Windows 98 and Windows ME Users

Restarting the computer
START C:\Windows\RUNDLL.EXE user.exe,exitwindowsexec
exit

Shut down the computer
C:\Windows\RUNDLL32.EXE user,exitwindows
exit

NOTE: When typing the above two lines, spacing is important. It is also very important that the exit line be placed into the batch file as many times Windows may be unable to restart the computer because of the open MS-DOS window.
Microsoft Windows 98, and Windows ME users may also perform the below command to perform different types of rebooting or shutting down.
rundll32.exe shell32.dll,SHExitWindowsEx n
Where n is equal to one of the below numbers for the proper action.
  • 0 - LOGOFF
  • 1 - SHUTDOWN
  • 2 - REBOOT
  • 4 - FORCE
  • 8 - POWEROFF

Windows XP users

Microsoft Windows XP includes a new shutdown command that will enable a user to shutdown the computer through the command line and/or batch files. Additional information about this command can be found on our shutdown command page.
Additional Information:
  • Additional information about the rundll and rundll32 files can be found on document CH000570.


How to restart or shut down the computer in DOS.

Issue:
How to restart or shut down the computer in DOS.
Cause:
It may be necessary for changes to become active that the computer be restarted.
Solutions:
Unlike Windows, to restart or shut down the computer in MS-DOS it does not require any steps or programs to be run. 
If you are currently in DOS and need to restart the computer, press the CTRL + ALT + DEL keys.
If you are currently in DOS and need to turn off the computer press the power button. Note: some newer computers may not allow the computer to be shut down unless the power button is held in for a few seconds.
Warning: If you are running a DOS or MS-DOS shell from Windows it is important that you do not follow the above instructions. Instead type exit to return to Windows and then restart or shut down Windows from Windows.


COMMAND.COM vs. CMD.EXE.

Question:
COMMAND.COM vs. CMD.EXE.
Answer:
Not to be confused with OS/2 Warp CMD.EXE, the file CMD.EXE is the Microsoft Windows NT command line shell and is more compatible and portable between different hardware platforms when compared to the original COMMAND.COM, which has been used as the command interpreter with DOS for several years. COMMAND.COM is included for backwards compatibility and is recommended that it be used when old MS-DOS programs may not be able to be run in Windows NT.

What versions of Windows have support for the CMD command?

Microsoft Windows NT, Windows 2000, Windows XP, and above all have support for the CMD command.

What are some advantages of using CMD instead of COMMAND?

Apart from what was listed in the opening paragraph, one of the most noticeable things a user is going to notice when using CMD to access MS-DOS is the ability to use long file names. When a user is using COMMAND they will need to use the short 8.3 name. For example, if a user wanted to access "My Documents" in COMMAND they would need to type "cd mydocu~1". However, a user who wanted access the same directory through CMD could type "cd my documents".
Additional information about long file names can also be found on document CH000209.


How to create a file in MS-DOS.

Question:
How to create a file in MS-DOS.
Answer:
A file can be created using the edit command or the copy con command. Below are examples with each command on how to create a file called myfile.txt.
With the edit command to create a file type the below command at the prompt.
edit myfile.txt <press enter>
If available, this should open the edit editor (generally a blue screen); once you have typed the information for the file myfile.txt, click File and choose exit. If you do not have a mouse, please see our edit page
When clicking exit, the computer will prompt you if you wish to save the file, click Yes and the file will be created.
With copy con command to create a file type the below command at the prompt.
copy con myfile.txt <press enter>
When typing the above you will return down one line to a blank line. Create the file line by line. Once you are ready to create the file press enter to get to a blank line and then press and hold CTRL and press Z then let go of both buttons. This will return a ^Z. Once this has been entered press enter to save and exit the file. 


How does MS-DOS interpret commands?

Question:
How does MS-DOS interpret commands?
Answer:
Each time a command is entered into MS-DOS the computer will go through the below steps.
  1. Computer looks at the command.com for any internal command matches. If the command entered is not found it continues to the next step.
  2. Computer looks for executable files in the current directory that match the user's command entered. If no files exist that match the users command it continues to the next step.
  3. Computer looks at each of the directories in the environment path that match the user's command entered.
Below are some different scenarios of how MS-DOS may interpret a command that a user enters.
User enters the "dir" command.
The computer looks at the command.com and notices that dir is a valid internal command and executes the instructions for that command.
User enters the "format" command.
The computer is unable to find this command in the command.com or the local directory but finds it in the path and executes the command as an external command.
User enters the name of a game he or she wishes to run.
The computer is unable to locate the command in the command.com but notices the executable is in the current directory and runs that file.
User enters a name of a non-executable file or an executable file that does not exist in any of the paths.
Computer is unable to locate the command or executable file in the command.com, current directory, or in any of the paths and generates the error "Bad command or file name".


How to change the prompt.

Issue:
How to change the prompt.
Reasoning:
A user may want to change his or her prompt to get additional information listed while navigating through their session.
Solution:

Microsoft DOS users

To change the prompt in MS-DOS you must utilize the prompt command followed by special codes used in conjunction with the prompt command. Below are some commonly used prompts.
prompt $p$g
Changes the prompt to the standard used prompt in MS-DOS listing the drive with the current path, similar to what is seen below.
C:\>

prompt $t  $d$_$p$g
Change the prompt to list the time and date above the standard prompt, similar to what is seen below.
13:38:49.78 Mon 02/17/2003
C:\>

If you wish to make these changes permanent, edit the autoexec.bat and add the prompt line you used at the prompt.

Unix / Linux users

Changing the prompt in Unix / Linux varies depending on what shell you are using. 
If you are using the C Shell, type:
set prompt="`hostname`>"
Displays the hostname in the prompt, similar to the below prompt:
ComputerHope>

set prompt="`pwd`>"
Displays the working directory with the prompt:
/root>

set prompt="`hostname`(`pwd`)>"
Displays the hostname along with the working directory:
ComputerHope(/root)>

set prompt=\[`id -nu`@`hostname -s`\]\#\
Displays the user who is logged in along with the hostname, similar to the below prompt:
[root@computerhope]#

If you wish to make the prompt permanent in the C Shell, edit the .cshrc file and add the same line you used at the prompt.
Additional Information:



How do I change drives in MS-DOS?

Question:
How do I change drives in MS-DOS?
Answer:
To change the drive letter in MS-DOS, type the drive letter followed by a colon. For example, if you wanted to switch to the floppy disk drive you would type: "a:" at the prompt. Below is a listing of common drive letters and their corresponding devices.
 

Drive
Device
A:
Floppy disk drive (Commonly the 3.5" floppy drive).
B:
Secondary floppy disk drive, if present (Commonly the 5.25" floppy drive).
C:
Always the computer hard disk drive's primary partition (unless the hard drive is not available or is bad). 
D:
Commonly the CD-ROM drive or other drive unless the computer hard disk drive has multiple partitions. If multiple partitions exist, your CD-ROM drive will be the last letter; for example, if one extended partition exists, your CD-ROM drive will be the E: drive.
After pressing enter your computer should change the MS-DOS prompt to reflect the new drive letter. If the drive does not exist, you will receive an error similar to the below error.
The system cannot find the drive specified.
If this occurs and you know the drive exists, it is likely that your drive is having problems. See our help section for the drive you are having problems with. If the drive exists but does not have media that can be read you will receive an error similar to the below error.
The device is not ready.
For example, the above error will be received if you attempt to switch the the floppy disk drive with no diskette in the drive.
Tip: If you just want to list the files on an alternate drive you can also type dir followed by the drive letter. For example, "dir a:" will list the files on the floppy drive but will not switch your prompt to the floppy drive.
Additional information:
Additional information about viewing available drives on the computer can be found on document CH000854.



How do I run a file from MS-DOS?

Question:
How do I run a file from MS-DOS?
Solution:
To execute or run a file from MS-DOS you must run an executable file,  which are .exe, .bat, or .com files. If you are uncertain with what files in the current directory using the dir command can list all files in the current directory. If you only wish to view executable files you can type the below command at the MS-DOS prompt to list .exe files, or replace .exe with .bat or .com to see those files in the current directory.
dir *.exe
Once you have determined the name of the executable file you wish to run, type the name of the executable file at the MS-DOS prompt. For example, if the file were game.exe you would type "game" at the MS-DOS prompt.
If you do not see the file you wish to execute or are receiving an error such as "bad command or file name" it is likely that the file you're attempting to execute is not in the directory you're currently in. Move to the directory of the executable file and attempt to execute the command again. For example, lets assume you downloaded the executable file game.exe and it is on the Windows XP desktop. Using the cd command you can switch to the desktop directory by typing a command similar to the below example.
cd\docume~1\hope\desktop
or in some cases if you're already in the username directory of documents and settings you can simply type the below command.
cd desktop
Finally, it is important to realize that when running an executable file from a MS-DOS shell (running MS-DOS within windows) that the program will still use Windows to run. If you wish to run any other file types you can use the MS-DOS start command and type: "start name_of_file" where name_of_file is the files name.
Additional Information:
  • See our cd command and dir command pages for additional information about each of these commands.
  • Additional information about installing a software program can be found on document CH000561.





What is the MS-DOS command to get back into Windows?

Question:
What is the MS-DOS command to get back into Windows?
Additional information:
Users who have entered a MS-DOS command prompt may wish to get back into Windows.
  • If you are looking for information on how to switch between MS-DOS and Windows without closing either of them, see document CH000562.
  • If you are looking for information on maximizing or resizing your MS-DOS window, see document CH000020.
Answer:

Microsoft Windows 98, Windows ME, Windows 2000, Windows XP users

Type "exit" and press enter to close the MS-DOS window and get back to Windows. See our exit command page for additional information about this command.

Microsoft Windows 95 and Windows 3.x users

  1. Type "exit" and press enter to close the MS-DOS window and get back to Windows. See our exit command page for additional information about this command.
  2. If you've completely exited Windows and are only in a MS-DOS prompt, it may be necessary to re-execute Windows. If the exit command does not work, try typing in "win" and pressing enter.



Command line vs. GUI.

Question:
Command line vs. GUI.
Additional Information:
Users not completely familiar with a command line interface and/or a graphic user interface may want to know the advantages and disadvantages of each interface to determine which one is best for them and/or to help become more familiar with what type of interface is best to use and why. Below is a chart to help illustrate the major advantages and disadvantages of each of the interfaces.
Answers:
Topic
Command line
GUI
Ease
Because of the memorization and familiarity needed to operate a command line interface new users find it much more difficult to successfully navigate and operate a command line interface.
Although new users may have a difficult at time learning to use the mouse to operate and use a GUI most users pick up this interface much easier when compared to a command line interface. 
Control
Users have much more control of their file system and operating system in a command line interface. For example, users can easily copy a specific type of file from one location to another with a one-line command.
Although a GUI offers plenty of control of a file system and operating system often advance users or users who need to do specific task may need to resort to a command line to complete that task. 
Multitasking 
Although many command line environments are capable of multitasking they do not offer the same ease and ability to view multiple things at once on one screen.
GUI users have windows that enable a user to easily view, control, and manipulate multiple things at once and is commonly much faster to do when compared to a command line.
Speed
Because command line users only need to use their keyboards to navigate a command line interface and often only need to execute a few lines to perform a task an advanced command line interface user would be able to get something done faster then an advance GUI user.
A GUI may be easier to use because of the mouse, however using a mouse and/or keyboard to navigate and control your operating system for many things is going to be much slower then someone who is working in a command line environment. 
Low resources
A computer that is only using the command line takes a lot less of the computers resources.
A GUI will require a lot more system resources because of each of the elements that need to be loaded such as icons, fonts, etc. In addition video drivers, mouse drivers, and other drivers that need to be loaded will also take additional resources.
Scripting
A command line interface enables a user to easily script a sequence of commands to perform a task or execute a program.
Although A GUI enables a user to create shortcuts, tasks, or other similar actions to complete a task or run a program it doesn't even come close in comparison to what is available through a command line.
Remote access
Often when accessing another computer or networking device over a network a user will only be able to manipulate the device and/or its files using a command line, CLI, or other text only manipulation.
Although remote graphical access is becoming popular and is possible. Not all computers and especially not all network equipment will have this ability.
 
 


How do I copy files?

Question:
How do I copy files?
Answer:
Below are the steps required to copy computer files from one source to another in each of the major operating systems. Click on one of the below links to scroll down automatically to the operating system you need help with, or scroll down and review them all.
Microsoft Windows 95, 98, ME, NT, 2000, XP, and 2003 users
MS-DOS users
Linux / Unix users

Microsoft Windows 95, 98, ME, NT, 2000, XP, 2003

Below are the simple steps on how to copy a file or multiple files in Microsoft Windows from one location to another.
  1. Go to the files or folders you wish to copy. If you need help locating the file you wish to copy, see document CHFIND
  2. Highlight the file or files you wish to copy. If you need to highlight more than one file, you can hold down the CTRL or Shift keys on your keyboard or drag a box around the files you wish to copy. Additional information about selecting multiple files to copy can also be found on document CH000771.
  3. Once highlighted, you can either right-click one of the highlighted files and select copy, or if you're in My Computer or Windows Explorer you can click Edit at the top of the window and choose Copy.
  4. Move to the location you wish to copy the files to and either right-click in the folder and choose paste, or click Edit and click Paste.
In addition to copying files through Windows, you can also use MS-DOS to copy files. In some situations, such as copying multiple files of a certain extension or with a certain name, it can be a lot easier to copy the files through MS-DOS than in Windows.

MS-DOS users

Below are steps on how to copy a single file from one directory to another directory as well as how to copy multiple files from one directory to another directory.
Copying a single file from one location to another.
  1. Using the cd command, move to the directory that contains the file you wish to copy.
  2. Type a command similar to the below command.

    copy myfile.txt c:\my\location

    In the above example, you would substitute "myfile.txt" with the name of the file you wish to copy, and "c:\my\location" with the directory you're copying to.
Copying multiple files to another location
  1. Using the cd command, move to the directory that contains the files you wish to copy.
  2. Once in the directory that contains the files you wish to copy, type a command similar to one of the below commands.

    copy *.* c:\mydir

    In the above example, the command would copy every file in the current directory to the "mydir" directory.

    copy *.txt c:\mydir

    In the above example, the command would copy every txt, or text file, in the current directory into the "mydir" directory.
Additional examples of wildcard characters can be found on our wildcard dictionary definition.
See our cd command, dir command, and/or our copy command pages for additional information about each of these MS-DOS commands.

Linux / Unix users

Below are steps on how to copy a single file from one directory to another directory as well as how to copy multiple files from one directory to another directory.
Copying a single file from one location to another.
  1. Using the cd command, move to the directory that contains the file you wish to copy.
  2. Type a command similar to the below command.

    cp myfile.txt /usr/bin

    In the above example, you would substitute "myfile.txt" with the name of the file you wish to copy, and "/usr/bin" with the directory you're copying to.
Copying multiple files to another location
  1. Using the cd command, move to the directory that contains the files you wish to copy.
  2. Once in the directory that contains the files you wish to copy, type a command similar to one of the below commands.

    cp *.* /usr/bin

    In the above example, the command would copy every file in the current directory to the "/usr/bin" directory.

    cp *.txt /usr/bin

    In the above example, the command would copy every txt, or text file, in the current directory into the "/usr/bin" directory.
Additional examples of wildcard characters can be found on our wildcard dictionary definition.
See our cd command, cp command, and ls command page for additional information about each of these commands.



How do I determine the size of a file?

Question:
How do I determine the size of a file?
Answer:
Microsoft Windows users
MS-DOS users
Linux / Unix users

Microsoft Windows users

Below are the steps required for determining the size of a file or multiple files on computers that are running Microsoft Windows.
  1. Locate and highlight the file or files you wish to determine the size of.
  2. Right-click the file and click Properties.
  3. Within the file Properties you will be able to determine the size of the file or files you have highlighted.
or
  1. Open My Computer or Windows Explorer
  2. Make Windows display the file properties by clicking on View at the top of the Window and selecting Details. This will make your Explorer / My Computer display all your files, their sizes, type, and modified date. If you wish to keep this view for all folders every time you open My Computer or Windows Explorer, see document CH000770 for additional information.
or
  1. Open My Computer or Windows Explorer
  2. Move to the directory containing your file.
  3. If you wish to see the total space of the current directory, view the size of the directory in the right side of the status bar; otherwise, highlight the file you wish to view the size of and view the status bar.
In addition to the above steps, Windows users can also see the file sizes through MS-DOS using the steps shown below.

MS-DOS users

Below are the different methods a user can use to view the size of a file or files in MS-DOS.
  1. Move the the directory of the file you wish to view the size of.
  2. Once in the directory, perform one of the below commands.

    dir myfile.txt

    In the above example, if you wanted to see the size of the file "myfile.txt", you would type this command to see the size of that single file.

    dir *.txt

    If you needed to see how much space multiple files of a certain extension are, type the above command. In this case, it would display the space all txt files in the current directory are taking. Additional wildcard examples like the one above can be found on our wildcard definition.
See our cd command and dir command pages for additional information about each of these commands. 

Linux / Unix users

Below are some of the different methods a *nix user can use to determine a size of a file on their computer.
  1. Move the the directory of the file you wish to view the size of.
  2. Once in the directory, perform one of the below commands.

    ls -l help.html

    Performing the above command would list output similar to the below information.

    -rw-r----- 1 comphope www 11567230 Nov 24 01:12 log.txt

    In the above output example, the 11567230 is the size of the file. For a more user friendly output, use the du command as shown below.

    du -h log.txt

    This command would display the output "12M log.txt"

    If you wish to see the total size of multiple files, type a command similar to the below command.

    du -ch *.txt

    In the above example, the command would list every .txt file in the current directory, display the size of each of those files as it was listing the files, and the total of all the files combined. 


How to copy a directory / folder.

Question:
How to copy a directory / folder.
Answer:
Microsoft Windows 95, 98, ME, NT, 2000, XP, and 2003 users
MS-DOS users
Linux / Unix users

Microsoft Windows 95, 98, ME, NT, 2000, XP, 2003

To copy a folder in Windows follow the below steps. When copying a folder in Microsoft Windows everything in the folder including all files and subdirectories will be copied.
  1. Locate and highlight the folder you wish to copy.
  2. Right-click the folder and click Copy or click Edit and then Copy.
  3. Move to the location you wish to place the folder and all its contents and click Edit and then Paste or right-click and then click Paste.

MS-DOS users

To copy a directory in MS-DOS you will need to use the MS-DOS xcopy command.  Below is a basic example with each of the steps to do this in MS-DOS. If you need additional information about the xcopy command or additional examples see the above xcopy command page.
1. Move to the directory you wish to copy the directories and subdirectories to. In the below example we will be moving to the temp2 directory using the cd command.
cd\temp2
2. Once in the directory use the xcopy command to copy another directories subdirectories and contents. In the below example we're copying the temp3 contents into the temp2 directory. Keep in mind that this will not copy the actual directory "temp2" just the files and directories in that directory.
xcopy c:\temp3 /e
Once the above steps have been completed everything should be copied into the temp2 folder.

Linux / Unix users

To copy a directory with all subdirectories and files use the Linux / Unix cp command. Below is an example command of how you would use the cp command to copy files. Additional information about this command and other examples can also be found in the above cp link.
cp -r /home/hope/files/* /home/hope/backup
In the above example the cp command would copy all files, directories, and subdirectories in the /home/hope/files directory to the /home/hope/backup directory.



Information and help with the io.sys file.

Question:
Information and help with the io.sys file.
Answer:

What is the io.sys file?

The io.sys file is a MS-DOS and Windows 9x hidden system file that is used to load the operating system each time the computer boots. Computers running MS-DOS with our without Windows 3.x required the io.sys and msdos.sys as well as other system files in order to load MS-DOS.
With the introduction of Microsoft Windows 95 the executable msdos.sys file was merged into the io.sys file and was still a required file to boot the computer into Windows 95 or Windows 98. The msdos.sys file can still exist on computers running either of these versions of Windows however is now a configuration text file and not an executable file.
Later versions of Windows no longer required this file to boot and do not have the option to boot into MS-DOS. However the file can still be found on Windows ME computers for users that need to create a bootable diskette.

How can I edit the io.sys file?

The io.sys file is an executable file and cannot be edited by a standard text editor. If you attempt to edit the file it will only appear as a bunch of symbols and unreadable characters.
Any configuration settings that may need to be done in the io.sys are handled through the config.sys file. See our autoexec.bat and config.sys page for additional help and information with these files.
Microsoft Windows 9x users can also configure additional settings through the msdos.sys file, which as mentioned above is a configurable text file in these versions of Windows.

I need the io.sys file for a bootable diskette.

See our bootable diskette page for additional information about creating boot diskettes and sites where you can download files required for bootable diskettes.

Can you send me the io.sys file or can I download it?

Because of copyright restrictions we cannot send users the io.sys or make it available for download.



What is the MS-DOS path for the Windows desktop?

Question:
What is the MS-DOS path for the Windows desktop?
Answer:

Windows 2000, 2003, XP, and Vista users

The desktop is located in the below directory path. In the below examples you would replace the (username) with the name of the profile you use to log into Windows.
c:\documents and settings\(username)\desktop
or
c:\docume~1\(username)\desktop
Often when entering the command prompt by clicking Start, Run and typing either cmd or command you'll automatically be placed into the (username) directory, therefore you'll only need to type: cd desktop to get into the desktop.
If you're in any other directory you would need to type cd\docu~1\(username)\desktop to get into the desktop.

Windows 98, 95, and ME users

When entering the MS-DOS prompt you should automatically be placed into the desktop directory. However, for those users who are not, or simply need to know where the desktop directory is for these versions of Windows it is the below path.
c:\windows\desktop
How do I install MS-DOS?

Question:
How do I install MS-DOS?
Answer:
Before getting into the steps of installing MS-DOS it is important to realize today that almost all users will not need to install MS-DOS on their computers. Computers that have any version of Microsoft Windows installed on them can run an MS-DOS command line shell, which can be used to perform almost all of the same tasks earlier versions of MS-DOS could, including running most of the older MS-DOS programs. Additional information about getting into MS-DOS from Windows can be found on document CHDOS. If you want to install MS-DOS to get an older program or game to work see document CH000587.

Installing MS-DOS 6.22

Although there are earlier versions of MS-DOS we suggest users who want to install a stand-alone version of MS-DOS install the last version of MS-DOS; version 6.22.
Finally, keep in mind that MS-DOS is an operating system. If your computer already has an operating system such as Windows XP, performing the below steps will erase it as well as any other programs currently installed on the computer.
  1. Insert the first MS-DOS installation diskette into the computer and reboot it or turn it on. If you do not have the MS-DOS diskettes you will be unable to install MS-DOS. Because this is an older program we suggest reviewing document CH000614 for additional information about locating this program if you do not have it.
  2. If the MS-DOS setup screen appears when the computer starts press the F3 key two or more times to exit from the setup.
  3. Once at the A:\> MS-DOS prompt type fdisk and press enter.
  4. In the fdisk screen delete any current partitions existing on the computer and then recreate the partitions. Additional information about fdisk, including an fdisk simulation can be found on our fdisk command page.
  5. Once a new partition has been created exit out of fdisk and get back to the A:\> prompt.
  6. At the prompt type format and press enter.
  7. Once the hard disk drive has been formatted reboot the computer with the diskette still in the drive and once back at the setup screen run through the setup of MS-DOS on the computer.
Following the above steps will install MS-DOS in the computer. If successfully installed the computer should be able to get to a MS-DOS prompt with no diskettes in the computer.