30,000 Foot View

I normally avoid shop words or “corporate speak” because I feel it dehumanizes us, but sometimes these phrases are necessary.  So here is the “30,000 foot” view.  And please pardon the appearance of my flow charts and diagrams, I am not a graphic designer…

All images open in a new tab, so just click on them with ease until I figure out how to make this WordPress theme wider.

Take  look at this:

  • There are four (4) 10 Gigabit ports available on this device, but I am using only one of them for this design.
  • The FPGA design contains a MicroBlaze soft-core Processor
  • This MicroBlaze soft-core Processor is connected to a LabVIEW for Windows Executable, which is the same thing as a standard .exe file.

Now some more details:

  • Any type of Ethernet Frame can enter the 10 Gigabit PHY, as shown in the diagram:
    • An Ethernet Frame with an ARP Request
    • An Ethernet Frame with an IPv4 Packet containing an ICMP packet, otherwise known as a “Ping” message
    • An Ethernet Frame with an IPv4 Packet containing a UDP packet.  You may known this as “multicast” or “broadcast” messages.

And finally, everything, the full shebang:

I have broken out the details of the hardware comprising the 10 Gigabit connection.  Technically it consists of 4 SFP+ connectors going to a Multi-Gigabit Transceiver.  Now normally I would think that there is some sort of chip in between the SFP+ and the FPGA, but I believe that National Instruments has used some Xilinx IP that handles this for us.  See, I don’t even know what hardware is being used, but I am able to use it and make an FPGA-based Network Card!

The data from the Multi-Gigabit Transceiver then goes to the 10 GE MAC Core, which sends all received packets

First, a Definition

CLIP – Stands for Component Level IP and is a method of bringing in non-LabVIEW FPGA code in to LabVIEW.  Basically you take a synthesized design, wrap it up in some VHDL and import this VHDL file to LabVIEW.  In this case an instance of the OpenCores 10 GE Mac is being brought in to the the design.

See the top-level wrapper VHDL file here:


See the source code of this core here:


See the official project website for this core here:


Now the Descriptions

A – The reading of incoming frames is wrapped up in to a nice library by National Instruments, that even includes some IP to respond to ARP messages.  I have stripped all of this usage out for this design because I wanted simplicity for learning.  Anyway, on each clock cycle you have the following variables:

  • data valid [boolean]
  • data [64 bit WORD]
  • byte enables [array of 8 booleans]
  • End of Good Frame [boolean]
  • End of Bad Frame [boolean]

Here is a screenshot of the usage for this, it should be very easy to understand once you have an idea of what LabVIEW is and how it works.

B – Now the data coming from the 10 Gigabit PHY contains 64-bit WORDs, and 2 booleans, one for a good frame, and one for a bad frame.  Now I do not know how to configure and properly use a 64-bit AXI-Data Stream FIFO with a MicroBlaze processor, so I had to convert this data manually myself.  It did not take long, in fact I documented this in my log where it took me 1 hour and 15 minutes to do this following the LabVIEW FPGA State Machine paradigm.  Think of the LabVIEW FPGA State Machine paradigm or pattern as the absolute best of both worlds in terms of VHDL/Verilog and LabVIEW.

So, we have data coming in what I call “AXI-64bit format” and we have to convert it/write it to a LabVIEW FIFO.  Here is a close-up of this code:

The code above is running inside a loop clocked at 156.25 MHz, and on the left is how we get the data from the 10 GE MAC.  If “data valid”, or “End of Good Frame” or “End of Bad Frame” are true, we enter the self explanatory Case Structure, which is the same thing as an if statement.  Inside this case we package all the data in to a custom “Cluster type”, which is the same thing as a C structure and write it in to a LabVIEW FIFO.

C – Now we read one element on each clock cycle of the custom LabVIEW Cluster defined in step B, and convert this in to a 32-bit AXI Data Stream to be read by the MicroBlaze.

Here is a screenshot of the entire loop, which runs at 100MHz, because I clocked my MicroBlaze to that speed.  I could probably increase my MicroBlaze to 156.25MHz, but that will decrease my productivity in terms of longer synthesis times.

I zoomed out a bit further for this screenshot and included the clock specifier, which is 100 MHz.  Also notice how there is another “Case Structure” inside this loop, but the case is not “True”, but it is a State Machine with the “Read-Top” case showing as the default state.  This state checks if the incoming data is valid, and if so writes the upper 32 bits of the 64-bit data WORD in to the AXI-Data Stream FIFO that is connected to the MicroBlaze.

Here is a close-up of the “Read-Top” state:

Here is the other state:

The “Read-Bottom” state.  This state will write the lower half of the 64-bit WORD and will check if this is the final element in the Ethernet Frame.  If this is the final element, it will enter the “Append-Size” state, which is incorrectly named, will fix that later “TODO: Rename Append-Size”. haha.  Anyway, it will append some metadata indicating if this frame should be dropped or kept.

The final state – “Append-Size”:

This code is very simple.  I set TKEEP to all one’s, or 0b1111, and I set the first 2 bits of the 32-bit WORD to contain “End of Good Frame” and “End of Bad Frame”.  Now why am I setting TKEEP to all 1’s? Well, simple, because I haven’t implemented this part yet, however, by setting it to all 1’s my code will still work because most TCP/IP stacks just ignore all padded 0’s.

D – Now the MicroBlaze has been programmed with a function that reads an incoming frame from AXI-Data Stream FIFO #0 and writes its contents to AXI-Data Stream FIFO #1.  It also reads an incoming frame from AXI-Data Stream FIFO #1 and writes its contents to AXI-Data Stream FIFO #0.  This is a simple passthrough that exercises my implementation of the FIFOs.

A direct link to the source code of this C code that is running in the MicroBlaze:


And a screenshot:

E – Now what happens after the source code above executes? Well, data is read from the 10 Gigabit PHY and read on the first AXI-Data Stream FIFO and written out back to the rest of the FPGA via the second AXI-Data Stream FIFO.  So we want to read this ethernet frame and write it up to the Host application running on normal/regular Windows.  This is very simple, read data, write data to a Target-to-Host LabVIEW FIFO, and if tlast is equal to true, include this in the metadata, which for now is simply the upper half of the 64-bit WORD.

F – Now how do you read this data on the host? Well, if you are familiar with LabVIEW, the code would look like this:

The green box contains a reference to the running FPGA.  The first box on the left polls the FIFO to see if any elements are available, and if the number of elements available is greater than 0 it reads that number of elements.

If you wanted to do this from C++, you could use LabWindows CVI to read from the FPGA interface as such:

/* read the DMA FIFO */
NiFpga_MergeStatus(&status, NiFpga_ReadFifoI16 (session,
data, numSamples, timeout, &r));

(See: http://www.ni.com/tutorial/8638/en/)

Please note that you can also link to the LabWindows CVI library and use it from your existing C++ applications.  Drivers for this specific board are only available for Windows, but if you are a big bank or financial firm with deep pockets, I’m sure you can set up some sort of agreement with National Instruments to port this code and drivers to <Operating System of your Choice>.

Okay, that is great, now what about writing data from the Host application back to the FPGA for sending out of the 10 Gigabit PHY? Well, you do the opposite, you enter the codes in reverse. (Spies Like Us).

Instead of a “Target-to-Host” FIFO, use a  “Host-to-Target” FIFO, and in my case, I prepend the size in WORDs to the packet to be sent.

Again, the green box is a reference to the running FPGA.  The square light green colored box is a function, also known as a “sub-VI” that I wrote that generates a UDP packet (or is it a UDP datagram? I forget).  The output of this UDP packet is converted in to 32-bit WORDS by the box with a white background, and then the size is prepended to this array and written in to the “HT_WRITE” LabVIEW Host-to-Target FIFO.

G – So we are receiving data in the following format from the host:





We read the size and then the rest of the elements from the FIFO and write them to the 2nd AXI-Data Stream FIFO that is connected to the MicroBlaze.  Again note that I have not yet fully implemented the proper usage of the TKEEP signals, so in this case the TKEEP signal can be dynamically set from the Host application for testing purposes.

H – Now that the MicroBlaze has read our outgoing ethernet frame on Fifo #1 and has written the same outgoing frame on FIFO #0, we have to convert this 32-bit AXI Data Stream in to 64-bit words that are suitable for our 10 Gigabit Ethernet PHY.

This time however, I used a proper state machine and named all of the states correctly.

The left-most box connects the signals from the MicroBlaze to LabVIEW and wires them in to the state machine.  If the data is valid and it is not the last element, the top half is stored in to a shift-register and the next state is “Read-Bottom”.  Here is a close-up of the “Read-Top” state:

And here is a close-up of the “Read-Bottom” state:

I – Now we have written the 32-bit data coming from the MicroBlaze in to a LabVIEW FIFO with 64-bit data WORDS and we have to write this out using the CLIP, via the National Instruments provided wrapper.

Look at how simple and beautiful this code is!

You can look at the source code here: (you must clone the repository to your local machine to see it easier, just clone it and open the 2nd file – the html file)

If you do now know git, you can also download a zip file of the entire repository:


If you somewhat know git, you can:

git clone git@github.com:JohnStratoudakis/LabVIEW_Fpga.git

And finally, browse the documentation, which is probably outdated here:


Up Next

So what now? Do I continue cleaning up the code and updating documentation? Do I make a youtube video demonstrating this?  Do I modify the MicroBlaze code to no longer just be a “passthrough” but instead to send all data through the lwIP TCP/IP stack? If I do this, I will have to modify the elf file (compiled binary) that is embedded in to my design, breaking this design, so I can make multiple Xilinx checkpoints and it will work, but that will confuse all of my readers… Man decisions, decisions.

How about this, I finalize this project, make a new sub-directory in the source code and make a brand new LabVIEW FPGA project and this time I will use the lwIP version of the source code and I will make sure that everything is reproducible.  It is raining now anyway and I want to stay inside and code…


Coding Standards Matter…

I have wired up the components of my 10 Gigabit FPGA Accelerated Network card with great care, and I decided to have my “tester” application skip the lwIP stack and to pass the received packet directly to the host for testing/verification purposes.

Everything was checking out fine, the LabVIEW code looked flawless, the interface to the 10 Gigabit Transceiver was perfect.  All looked fine, but for some reason I was not receiving the packets on the host.

I analyzed the code, inserted probes and what not.  And finally, I was reading through the actual C++ code (MicroBlaze C++ that is) I found the bug.

A very simple bug hidden in plain sight!

// Now echo the data back out
if (XLlFifo_iTxVacancy(fifo_1)) {
XGpio_DiscreteWrite(&gpio_2, 2, 0xF001);
for ( i = 0; i < recv_len_bytes; i++) {
XGpio_DiscreteWrite(&gpio_2, 2, buffer[i]);

XLlFifo_Write(fifo_1, buffer, recv_len_bytes);


XLlFifo_iTxSetLen(fifo_1, recv_len_bytes);

Do you see the error?  Well, neither did I, until I read the documentation for XLlFifo_Write again, for the umteenth time… I was writing the data of the packet to the buffer (length of packet) squared times! Why? Because the single call to XLlFifo_Write is writing the entire packet on each call.

Anyway, I am now re-synthesizing my code and we will see what happens when I run it in around 2 hours time.

Also, I added the TKEEP signal to my AXI Stream FIFO, and it worked exactly as expected, meaning that:

  • If I send 12 bytes from the LabVIEW FPGA FIFO in to the MicroBlaze, it detects 12 bytes
  • If I send 13 bytes, with the TKEEP signal being 0b0001 for the last word only, and 0xF for the rest, I get 13 bytes in the MicroBlaze code.
  • If I send 14 bytes… and so on and so forth, MicroBlaze recognized only that many bytes.

However, everything was aligned to 32 bit words.

Maybe I will work on cleaning up and pushing some of my code to github while I wait…

AXI4 + MicroBlaze != 64-bit

The 10 Gigabit MAC/transceiver gives me 64 bit data words.  I currently think I am giving and getting  64 bit data words, but I am really only using 32 bits.  I came to this conclusion after I tried reading a 64 bit word and saw the data was simply two repeated 32 bit words.  Additionally some random person on the internet said that the MicroBlaze data bus is 32-bit and you have to use some sort of data width converter ip.

Out of luck… I don’t know how to use the converter, but I am sure there is a way to properly convert this by using LabVIEW FPGA.  So for starters, this means I can remove my AXI4 Stream Data FIFOs and keep the two 32-bit versions.  I’ll also throw in support for TKEEP while I am at it.

So the “Receive Ethernet Frame” code from the 10 Gigabit transceiver/MAC looks like this:

I have to convert this 64-bit data stream in to a 32-bit data stream before I send it in to the MicroBlaze.  Here is the current/erroneous implementation:

So what do I have to do? I have to read one element from the LabVIEW FIFO – the FIFO on the left, write the upper half of the 32 bit word in one cycle, and not read from the LabVIEW FIFO for the next clock cycle and to write the lower half of the 32 bit word.  Want to see the power of LabVIEW? It is 7:22 AM right now… [elevator music/jeopardy music starts playing in the background]

Now it is 8:07 AM and I have finished re-factoring this loop.  I am writing the upper half of each 64 bit word in one cycle, and am writing the bottom half during the next clock cycle.  I am also keeping the logic that appends an extra word which contains the “EndOfGoodFrame”, and “EndOfBadFrame” boolean values.  Since I am writing 32-bit words now, I am only appending one word.

Here is the full loop:

And a close up of Case 0 of the innermost Case Structure:

And a close up of Case 1:

I now have to do this for the other direction – convert a LabVIEW FIFO packet to an AXI 32-bit stream. Here is the current implementation:

The signal on AXI_STR_TXD_data is a U32 and I have to collect 2 of these values and insert them in to the FIFO on the right side.  I am going to have to think about this for a bit, but I have to get ready and go to work.  So I may not finish this before leaving.

Thanks and have a nice day!

Update: Okay, this is not that pretty, but here is my first-cut “20 minutes” version:

Now I have to go and get ready! But I’ll be sure to set everything to synthesize before I leave…

IP Integration Node vs CLIP

I wired up the 10 gigabit ethernet MAC to my MicroBlaze instance to my host computer and compiled/synthesized everything.  I then turn on my “quiet” PXIe-1062Q and fire up my tester application and it did not work…  I open up an isolated tester – “Fpga-Mac-Top.vi”, and it worked.  I open up the isolated MicroBlaze tester – “Fpga-MicroBlaze-Top.vi”, and nothing.  Not even a read from the GPIO.

This is quite strange… why is it not working? I spend some time looking over everything, re-generating output products, synthesizing from Vivado, bringing the design back in to LabVIEW, and long story short I was not setting the MicroBlaze Reset to ACTIVE_LOW, whereas in all of my previous designs I was setting it to ACTIVE_HIGH.  Anyway, while I wait for it to compile, I have something to say.  Which do you prefer? Using an IP Integration Node or a CLIP (Component Level IP) for using a MicroBlaze Processor from LabVIEW?

Well, first off, let me link to some National Instruments documentation on both:

And now let me show you some screen shots.  Here is a close up of what using an IP Integration Node looks like: (right-click to open in a new window for a larger version until I figure out how to modify this wordpress theme to be wider)

Here is a zoomed out version of this same VI:

And finally, what it looks like without an IP Integration node, but with a CLIP (Component Level-IP):


Can you see the difference? I can… for starters, I can read the full name of each signal when using a CLIP.  Additionally, with a CLIP I can split up my nodes in to separate locations, so that I can organize my VI in a much cleaner way.  And finally, since I can read the full signal name when using a CLIP node, I no longer have to hover over each signal to get the signal name, thus removing any reason for having comments as in the IP Integration Node version.

Anyway.  CLIP node is my recommended method of using LabVIEW FPGA to import Xilinx Vivado IP.

Also, this code was from a project that I implemented in order to learn how to use the AXI Stream FIFO inside of LabVIEW via a MicroBlaze.  In other words, how to communicate with a MicroBlaze processor via an AXI Stream FIFO from LabVIEW FPGA.

See the source code here:


Pros and Cons of LabVIEW FPGA

Ever since I started developing this LabVIEW FPGA project that uses a MicroBlaze soft processor to process TCP streams, I have learned a lot and can comment on the pros and cons of using LabVIEW FPGA vs using a traditional Xilinx/Altera based FPGA development approach.

For starters, LabVIEW FPGA blows every single other FPGA development system out of the water when it comes to developing prototypes.  I made a prototype for implementing a Monero miner in record time.  I don’t remember how long it took, but you can see my commit history here: https://github.com/JohnStratoudakis/CryptoCurrencies

Then I was able to implement a UDP based orderbook proof of concept, again in record time, see my commit history here: https://github.com/JohnStratoudakis/LabVIEW_Fpga/tree/master/MarketData/MarketData_02/Fpga

Then I decided that I wanted to make my orderbook support TCP/IP, which is what most Market Data Feeds are using, so I embarked on learning how to make LabVIEW FPGA play well with Xilinx Vivado.  I did not realize it at the time, but the knowledge I have gained over the past year is enough to make one not have to live with any of the cons that LabVIEW FPGA comes with.

  • I have learned how to integrate basic VHDL/Verilog IP in to a LabVIEW FPGA project.
  • I have learned how to integrate more complex Xilinx IP such as Adder/Subtractors, Fast Fourier Transforms, and AXI Stream FIFOs.
  • I have learned how to integrate an entire soft-core processor based system in as well.  Including both the simplified MicroBlaze MCS, and the more complex MicroBlaze processors developed by Xilinx.
  • Furthermore, I have been able to communicate between LabVIEW FPGA and the MicroBlaze processor via AXI Stream FIFOs, General Purpose Input/Output registers, and have implemented Interrupt handlers.

Using all of this together, I can develop in a very efficient manner the perfect prototype that uses existing Xilinx IP, IP from opencores.org, or proprietary IP that can use a MicroBlaze soft-core processor all from within LabVIEW FPGA.  This serves a great risk-mitigating factor in that one can tell if an FPGA will be a viable solution for a particular type of problem.  Then, one can choose to keep the LabVIEW FPGA implementation and scale it out, or one can rewrite the portions written in LabVIEW in another language such as Verilog or VHDL.

Usually, the first product that works is what makes it to market and is successful, not because it is the best, but because it is the most adaptable to change. Think Evolution… think VHS, think DVDs, think about the iPod.  These products were market leading because they got the job done right now, not later when all of the features were fully implemented.  Additionally these products were easy to use.

Anyway, I have fully wired up the 10 Gigabit Transceiver in to my MicroBlaze, and have wired the MicroBlaze to my host application, and I am anxiously awaiting my FPGA synthesizer to complete so I can test it out…

10 Gigabit FPGA-based Network Card

So here is the most simple, FPGA-based Network Interface Card that I know of.

This application will start Port 0 of the 10 Gigabit Network interface that is provided by the PXIe-6592R (http://www.ni.com/en-us/support/model.pxie-6592.html) board by National Instruments, and will allow you to do any of the following:

  • Check if any new ethernet frames have been received, and display the information, including the raw bytes of any such received frame
  • Send a raw ethernet frame out of Port 0

I have included the necessary code to parse and generate the following types of packets, enabling you to communicate with another computer on your network that supports:

  • Ethernet II
  • ARP
  • ICMP
  • IPv4
  • UDP

The VI’s to do this are located in the directory “Tests/MAC/Protocols”, simply wire the incoming frame data to the “Parse” VI’s, or write the parameters in to the “Create” VI’s.

How to Parse Incoming Ethernet Frames

For an example of how to parse an incoming frame see the “Poll RX” case inside the bottom While Loop of the “MAC-Tester” vi:

How to Create Ethernet Frames

For an example of how to create a valid outgoing ethernet frame with a valid CRC32 on the end, see the “Transmit Packet” case inside the bottom While Loop of the “MAC-Tester” vi:

This vi calls the “UDP-Create.vi” and wires the size – in bytes – and the frame data in 64-bit words to the transmit FIFO.

Full Source Code

See the source code on GitHub here:


See the README.md for more documentation.


Now I have to take this code and wire it up to my MicroBlaze implementation that also sits inside the FPGA project.  Only problem right now is that I have only figured out how to configure a 32-bit FIFO, and not a 64-bit FIFO.  So I can either do some sort of translation inside the FPGA or hope and get lucky by configuring the FIFO to be 64 bits wide.  Note: by FIFO, I am referring to an AXI-Stream FIFO.

Screen Shot Generator for LabVIEW

I finished writing an application that exercises the first Port of the 10 Gigabit Ethernet Interface that is provided with the National Instruments PXIe-6592R board and as I started taking manual screenshots via the LabVIEW “File->Print” option I began to ponder, can this be done more easily? Or dare I say it “programmatically”?

The LabVIEW Report Generation Palette has a VI named “Easy Print VI Panel and Documentation”.  In addition to the plethora of options, this VI also is hard to use and proved to be unstable for my purposes.  If you want to try it in your application, see the documentation here:


I ended up finding a way to manually save a png file with the Front Panel and the Block Diagram of a VI.  I then wrote a program that will recursively generate both a front panel and block diagram screenshot for each vi it encounters.  This makes is easy for me to quickly create and update any vi images so that you can view the source code directly from github, without having to wait until you get home and open the code in LabVIEW.

See the github project here:


Here is a screenshot of the top level vi of the application:




10 Gigabit FPGA-based Network Code Coming Soon

I am getting real close to finishing my proof-of-concept FPGA-based network card that is based on the PXIe-6592 National Instruments Board which uses the Kinex-7 410t FPGA chip by Xilinx, and has 2GB of DDR3 RAM.

Using the Arty Arix board, I was able to make sure that the MicroBlaze code running the lwIP TCP/IP stack works fine, and I was able to use a NI example to make the 10 Gigabit Ethernet MAC part.  Only issue is that the NI code is quite complex and uses features and ideas that I have never seen before.

Nevertheless, I am iterating over some modifications to the example to allow for a LabVIEW Host network stack that uses the FPGA only for the sending and receiving of ethernet frames.  Once I get that working, I will just switch the connection from LabVIEW Host to the on-board MicroBlaze.

How to Multiply 64 bit Numbers in LabVIEW

What is the product of 0x9D0BF6FDAC70AB52 and 0x6408F6540A1384CB?  Well, according to LabVIEW for Windows, the answer is 0x2D90DE07C0C42206.  According to C++ on OSX (without any optimizations, usage of Intel Intrinsic functions), the answer is also 0x2D90DE07C0C42206.

The real answer is…  0x3D5E2BF7DCBCA6622D90DE07C0C42206.

How do you get this number? You have to use compiler intrinsics, or calculate this value yourself.  LabVIEW does not make it easy to call an Intel Compiler intrinsic, so I took it upon myself to implement this myself.  Here is a screenshot of the implementation in LabVIEW for Windows:

To download and use this code in your project, see:


Note: FPGA version is coming soon, but I am busy working on something else right now


Some Time with the Arty Arix-7 35T Digilent Board

So I wanted to implement a simple, stripped down version of the open-source lightweight IP stack “lwIP” (https://savannah.nongnu.org/projects/lwip/) inside my LabVIEW FPGA project that I can handle TCP and UDP data streams.

I do not have a lot of experience with this, and I found that building such a project inside Vivado would take around 3 hours to simulate with all of the source code of the lwIP project embedded in the elf file.

I ended up purchasing a $99 board from Digilent that uses an Artix-7 35T board: https://www.xilinx.com/products/boards-and-kits/arty.html.

On this board I was able to run and debug the lwIP source code so that I could figure out how to use it with my configuration.  I creatd a public github repository with this source code, so if you happen to be trying to learn how to use the MicroBlaze processor with this board, check out:


Enjoy and I will be working on integrating this lwIP source code in to my LabVIEW FPGA project now.