Thursday, April 3, 2008

My Quad Core

As usual I was compiling the linux kernel on my Intel Quad Core desktop, and power goes off.. The UPS starts beeping to indicate that it is low on battery. Well, I decided to stop the compilation and poweroff the machine. To my surprise, the beeping went away! I was confused. I started the compile again, and lo, the UPS sang again..

The power consumption increases so much that the UPS detects it as a higher rate of battery discharge and starts beeping! I was amazed!

Probably when we have much bigger machines , I can expect the light bulbs in my room to dim down when I begin a kernel compilation. :-)

Saturday, February 9, 2008

git-bisect

Hello all,

For the uninitiated, git-bisect is a tool for identifying the patch that caused a particular bug. It works on the principle of binary search. Its a really cool thing to use, but it takes a long time to complete (not so much with make -j16 on my new Core 2 Quad!!). I identified that v2.6.24 was good and the current one was bad. (v2.6.24 was released around 15 days ago.) And within those 15 days some 4K odd patches had accumulated and I had to bisect the big pile! But thankfully it was successful and it pointed me to the right commit and hence I was able to fix the problem. Here's the patch - http://lkml.org/lkml/2008/2/8/342.

Thursday, February 7, 2008

My new Intel DG33FB

Hello all!

I decided to go for a upgrade from my Core 2 Duo + 1G RAM to Core 2 Quad + 4G RAM. My dealer gave me a Intel original Motherboard DG33FB for it. The problems with the new motherboard began right at the dealer's place. I have listed them out one by one, along with the way I was able to overcome the it

  1. Linux wouldn't boot, nor does Windows!
    • The first thing I did on my new machine was trying to get Linux up and running. But it wouldn't work! Then I had to reluctantly try booting windows, which didn't work either. Thinking that this could be a problem with BIOS settings, I just wandered around in the BIOS setup utility and came across an option called 'SATA MODE' which had AHCI and IDE as its two options, of which AHCI was selected by default. I tried changing it to IDE and it could boot windows now.
    • I could boot only the default Fedora 8 kernel, but none of the vanilla kernels. Surprisingly all of them hung when they tried to scan the PCI root bus.
  2. Vanilla kernels wouldn't boot.
    • My next objective was to boot vanilla kernels which i mentioned were hanging when they tried accessing the PCI bus. Since ACPI was famous for causing such failures, I tried booting with acpi=off kernel parameter. The kernel booted normally, but sadly it did use only one processor (out of the four in Core 2 Quad) which is no way acceptable!

    • I decided to debug this problem and started inserting printk()s in the source to find out the exact reason for the hang. During this process, I came across a kernel parameter called pci=conf1 and I decided to give it a try. To my surprise, the kernel (2.6.24-rc5) booted perfectly fine. Later i found out an option in menuconfig called 'PCI ACCESS' which allowed me to select the 'direct' method of probing PCI devices, and enabling this made me avoid passing the extra 'pci=conf1' parameter to the kernel.
  3. linux-2.6.git does come up, but I see a lot of PCI noise in dmesg and X doesn't come up
    • Is this a regression ? Yes, it is! Everything worked perfectly with 2.6.24-rc5, but not with linux-2.6.git i.e, 2.6.24. I decided to git-bisect to find the faulty commit. I was doing my first git-bisect in my life! I bisected it and found the commit that caused this. Deciding to debug this, I began looking at the code. It was fairly easy to fix this, and I came out with a patch to LKML hoping to see it included in linux-2.6.git soon..
      Incidentally, the bug was reported already and a fix for it was committed yesterday! You can read the whole thread here on LKML Archive(http://lkml.org/lkml/2008/2/7/55).

Thursday, January 10, 2008

More on HPET..

Welcome back!

If you are all waiting for what the reply from kernelnewbies was , there was no reply infact! This post is about the HPET research i carried out with the specs in hand and looking at the source code.

The line hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0, make me realize that something was fishy, so I decided to look into the area where the IRQs were actually assigned, and I found it in arch/x86/kernel/hpet.c.

In that piece of code, a register field names TN_INT_ROUTE_CNF was being read from the per timer Configuration and Capabilities register. I quickly turned pages in the specs to see what it does, and It read that

" Default is 00h Software writes to this field to select which interrupt in the I/O (x) will be used for this timer’s interrupt. If the value is not supported by this prarticular timer, then the value read back will not match what is written. The software must only write valid values."


So, I got the reason why the value of 0 IRQ was being given to timer 2. Well, if it's zero be default, who changes it was the question. Ideally the BIOS assigns IRQs to various PCI devices. So, could it be a BIOS bug ? No, because many posts on LKML with dmesg, showed the same line which I got.

I decided to try one more thing. Why not mmap /dev/hpet and look at the registers ? I did so, and got a dump of its 1K register space and began examining it. I turn to the 'Timer N Configuration and Capabilities register' in the spec, and began looking at the first field,

" Tn_INT_ROUTE_CAP - Timer n Interrupt Routing Capability: (where n is the timer number: 00 to 31) This 32-bit read-only field indicates to which interrupts in the I/O (x) APIC this timer’s interrupt can be routed. This is used in conjunction with the Tn_INT_ROUTE_CNF field. Each bit in this field corresponds to a particular interrupt. For example, if this timer’s interrupt can be mapped to interrupts 16, 18, 20, 22, or 24, then bits 16, 18, 20, 22, and 24 in this field will be set to 1. All other bits will be 0. "

Wow!! was my reaction. So using this bitmap field, I could assign interrupts to timers! I wrote a dirty and quick patch, and ran the userspace program and to my astonishment, it worked like a charm!

This is about making the userspace API work. The kernelspace API is a bit different, in that it does not use ioctls, but has a struct hpet_task object and stuff. Its simple to use. So my next job is to make the kernelspace API to work..

Will post again on that.. stay tuned folks!

Tuesday, January 8, 2008

HPET support for qemu

For those who don't know what HPET is check out the specifications at the Intel site and continue reading.

It was only yesterday I decided to work on adding HPET support to qemu. I talked to Amit Shah(the only employee of Qumranet in India), for ideas on this and how to go about it.

He told me to start off by making qemu access the HPET in userspace and then extend it to make it visible on the guest. Emulation of a HPET in cases where there is no real hardware, comes at a later stage. I am lucky to have a new Core 2 Duo 945 chipset machine which has HPET which fortunately became 'visible' only after I flashed my BIOS!. But my excitation quickly died down when I began trying to run the userspace program (Documentation/hpet.txt). It would not run. It was failing at the Interrupt Enable (HPET_IE_ON) ioctl command.

As usual, i decided to find a solution to this problem by googling for HPET. But even after searching for a long time, I could not find even a single place where i could get help, so i decided to write this blog, so that I will be the first one to have a nice resource on HPET!

Deciding to track down this problem, I started looking at the kernel sources. I began by adding printks in drivers/char/hpet.c. Grepping through the kernel logs showed me that my HPET had 3 timer blocks which used interrupts 2, 8 and 0. My printks told me that when I opened /dev/hpet, I got assigned a timer block which had interrupt 0, and hence the routine hpet_ioctl_ieon would not continue using it and it fails.

I think this happens because of improper ACPI tables which report the IRQ of the third HEPT timer block as 0. I have posted a query on kernelnewbies and now awaiting a reply..

Will continue this once i get this reply! Stay tuned..