About Me

Saturday, May 11, 2013

HIGMEM and Memory Zones from newbie perspective


HIGHMEM

Earlier Linux kernels(about 10 years ago) where unable to support more than 1GB of  physical memory in 32 bit x86 systems. This was attributed to the way in which the VM was arranged. In earlier Linux 32bit systems with 3:1 split, the first 3GB of the VM was the user address space and the remaining 1GB was kernel address space. The 1GB allocated to the kernel can be mapped to  any part of physical memory. If the Kernel has to access any pages in physical memory then that has to be mapped to the its address space. Earlier kernel used static mapping and hence with 1GB address space, the kernel can only map 1GB of physical memory. Thus any physical memory beyond the size of 1GB can never be accessed by the kernel. Thus the earlier Linux kernels where restricted to using 1GB of physical memory in 32bit x86 systems.

The solution that was devised to solve this problem was high memory or simply HIGHMEM. In this strategy we divide the physical memory into high memory and low memory. The low memory still uses the permanent mapping. That is certain pages in the physical memory is always mapped to the kernel address space and that is called the low memory. The remaining part of kernel address space makes use of temporary mappings and that is called the high memory. That is the pages are mapped as when required. This enables the kernel to access any pages within 4GB range. The kernel data structures like linked lists must live in the low memory itself. This helps in maintaining pointer consistency. 



MEMORY ZONES

In Linux kernel the page frames are grouped logically in to three.They are
1.ZONE_DMA
2.ZONE_NORMAL
3.ZONE_HIGHMEM
There are some reasons in the Linux kernel to divide the physical memory as explained next.In some architectures there is a constraint on which a range of memory can be used. An example can be given in x86 architecture. In x86 architecture , the ISA devices are only capable of addressing the first 16MB of RAM. It implies that the for DMA operations, the ISA devices can only use the first 16MB of RAM. Some architectures does not have this constraint(Example :- PPC ). Besides this we have to accommodate the HIGHMEM solutions. With the division of memory, managing the memory becomes easy with each zone having a struct to store its details.
ZONE_DMA:
Above I mentioned that ISA devices in x86 has a constraint in memory addressing. So in x86 architecture ZONE_DMA is the group of pages that belongs to first 16MB of RAM. Since PPC does not have this constraint and hence in PPC ZONE_DMA is empty.

ZONE_NORMAL:
In x86 architecture , the first 896MB of page frames are permanently mapped to the Kernel Address space. Which can be other wise stated as , the first 896MB of Kernel address space is mapped to the first 896MB of physical memory.So that leaves 128MB of unmapped addresses in the 1GB kernel address space. This unmapped addresses can be used to make the temporary mappings from the high memory. The union of ZONE_DMA pages and ZONE_NORMAL pages gives us the low memory pages. If ZONE_NORMAL pages are not available for allocation, as a last resort, kernel can use ZONE_DMA but not vice versa.


ZONE_HIGHMEM:
ZONE_HIGHMEM is the collection of high memory pages. So in 32bit x86 that will be any memory above 896MB. They are not permanently mapped to the kernel address space as explained in HIGHMEM section.

Sources :
[1] Linux Device Drivers,3rd ed.,Corbet et al
[2]Linux Kernel Development by Robert Love 
[3] http://lwn.net/Articles/75174/

Wednesday, July 18, 2012

Need for Shell Built ins

Usually a command is executed by the shell by spawning a new process. Though this strategy is effective in dealing usual commands it may turn ineffective in dealing certain others, especially one that changes the behavior of the shell itself.Example for such a command is the cd command. As you know, cd is a command used to change the current working directory in a shell.


Consider that we decided to adopt the conventional method for implementing the cd command,i.e we will spawn a new process whenever the cd command is encountered. If you where to explore Linux System Call API, you will find that the only sys call available for changing the directory is the chdir(). The problem with the chdir() is that it can only change the current working directory of the current process. Thus if we are implementing cd as a separate process, it will have no effect on our main process-shell.That's the stage at which the shell built ins kicks in. Built ins are nothing but commands whose implementation is the task of the shell itself. So whenever a built in command is encountered , it is implemented by the bash itself rather than as a separate process. 


The below program is the source code for a minimal shell that implements cd as a built in
Note: The minimal shell given above accepts maximum of only one argument for any given command including switches. Also the commands supported are the ones only in /bin directory along with exit and cd

Monday, June 04, 2012

Kruskal's Algorithm, Simplified

Note : This article is intended to the readers  who can implement Kruskal's algorithm manually at the least.


Kruskal's algorithm is an algorithm for building Minimum Spanning Tree (MST) from an undirected graph.  The algorithm can be summarized as "Go on building the MST by adding the edges in their ascending order of their cost such that they do not create cycles".This can be explained in steps as:-
1. Identify next minimum edge
2. See whether that edge creates a cycle,if added to the MST
3. If it doesn't, add it to MST ,else discard it and repeat step 1.
4. Go to step 1 if MST is not complete.


Consider that we are using the Adjacency Matrix for  representing  the graph. Then Step 1 is the easiest to carry out and Step 2 is the hardest. It is the Step 2 that I want to discuss in detail. Step 1 can be carried out simply by constructing nested loops. At the end of step 1, we have the next minimum cost and the name of the nodes connected by the minimum cost edge.


Now comes the trickiest part. Given an edge,how will we find whether that edge creates a cycle or not when added to the MST?This requires a clearer understanding of what cycles is.


Identifying Cycles

Cycles occurs when there are multiple paths from any node to any other node. So before adding an edge (x,y) to the MST, we have to see whether there is already a path that exists between nodes x and y. A path between nodes x and y can take  number of forms. Examples are:-
1. (x,z) (z,y)
    x is connected to z and z is in turn  connected to y,thus creating a path between x and y
2. (x,z) (z,a)(a,y)
    This can be similarly be explained as  example 1.

Explanation with an Example

Consider the below graph
The adjacency matrix representation of the above graph is given as follows:-
Note: The blank spaces can be regarded as infinity which denotes that the corresponding nodes are not related.Also the above matrix is a symmetric matrix.

Now we start applying the Kruskal's algorithm manually.
The first minimum cost edge is (1,3) or (3,1). We will add it to MST since it won't create any cycles for obvious reasons. Similarly by manual evaluation (4,6) , (2,5) and (3,6) are added to the MST. Now our the adjacency Matrix of the graph would look like as below:-
Note: For each edge added to the MST, the corresponding column is made blank(or infinity)

Now our MST graph would look like as below:-
The adjacency matrix for the same is given by:-
The next minimum cost edge is (1,4) with a cost of 5. But by manual evaluation we know that this edge will result in cyclic graph when added to MST. Now we will see how to infer whether an edge will result in cyclic property, from the adjacency matrix of the MST. Now we need to see whether there exists a path between 1 and 4.We follow the below steps:-
A.Take the columns of the first row. Name them as (1,k) where k=1 to n (n is the number of nodes). 
B.Select the non blank columns among (1,k)and name them as(1,nb)  . If any of the nb=4,then we can straight forward say that there is a path between 1 and 4.If non among (1,nb) has nb=4,then we will see whether there exists, a path between (nb,4) by applying same procedure as Step A.Thus this goes on recursively.
Based on this we develop a noPath() procedure,which will take the nodes,and MST matrix as arguments and returns 1 if there is no path between them else it will return 0.

procedure noPath(int node1,int node2,int nodePrev,int n,int MST[][])

begin

         int k,ret=1;

        for k in range 1 to n do

        begin

                 if(MST[node1][k]!=blank)

                 then

                       if k equal node2

                       then

                              ret=0

                      else if k not equal node1 and k not equal to nodePrev 

                                              then    
                                                                     ret=noPath(k,node2,node1,n,MST)
                                                                 end
                                             end
                       end
       return ret
end {noPath}

Note:The variable nodePrev has been used to avoid the reverse direction flow. For example
after having a search through edge (1,6),we do not want to turn back and search (6,1).


With the above algorithm I hope I have covered the most important aspect of Kruskal algorithm .You may see the real C program implementing Kruskal algo here

Sunday, January 22, 2012

Command line tool for Internet Usage Monitoring in Linux

Yeah, a command line tool for monitoring the internet usage so that I can avoid over usage of the internet. This was something that I have been trying to develop for quite sometime and the breakthrough happened just yesterday.  Initially I was trying to develop it using  shell scripts. The whole script was centered at the ifconfig command. When I tried installing it to the crontab, it won't work for some reason which I don't know yet. Then I turned to Python. Here too the script was centered on the ifconfig. But to my disappointment that too didn't work when installed to crontab. it was at this point that I thought about a different strategy. In order to get the details of Internet usage in current session I decided to rely on the file, the ifconfig command is relying rather than the relying on ifconfig command itself.
                       So the next challenge was finding out the file ifconfig is relying. Then I decided to analyse the system calls used by ifconfig . Here I took the help of strace command and arrived at the conclusion /proc/net/dev is the file that is used by the ifconfig for the details of internet usage in current session. And with this file things become lot easier.
              As of now this application will monitor the usage for one month. Then it is reset to zero automatically. You can change the day of the month in which the monitoring begins. The application  also has a feature wherein  you can set free usage time, so that the internet usage is not monitored between the specified time.It can only monitor the internet communication taking place through eth0 (wired) eth1 (wireless) interfaces. 


            You may track the development here You can install it by running Installer.sh script (Don't change the file names and all files should be in the same directory as of Installer.sh ). After installing you can use the tool by using the command Eusage along with the switches.
Note : Scripts won't work without running Installer.sh. Don't try to run Eusage without the switches
Eusage -u :- used to give the details of usage so far, from the beginning of the month 
Eusage -r :- To clear the memory (i.e. everything        will be set to zero)
Eusage -h :- for help
Eusage -c :- for changing free usage time and to set the day of month to begin the monitoring