A few months ago, we ran a benchmark on a traditional parser and Mistral 7B (Open Source LLM). The quality of the parsed result from Mistral 7B is quite impressive given it is only 7B parameters. One thing we weren't satisfied with is the processing time. Recently, I stumbled upon Groq who set the mission to revolutionize inference speed. They developed a chip for inference and they called it the Language Processing Unit (LPU). I have tested it and it is really impressive. I don't understand the technology of a chip but like CPU and GPU, I believe it will get faster and hopefully, we can get the inference speed down to 1 second consistently.

Given the speed of inference has been brought down, I am interested to see if the quality is on par with OpenAI GPT-4 in parsing HTML. Like the previous article, we will do a similar benchmark.

The Comparison

Query: Dentist

Query: Dentist (1st Result)
GPT-4Mixtral 8x7B
        {
          "title": "Dental Resource Management",
          "type": "Dentist",
          "phone": "(512) 462-2959",
          "address": "Austin, TX",
          "years_in_business": null,
          "hours": "Closed ⋅ Opens 6 AM",
          "rating": null,
          "reviews": "No reviews",
          "price": null,
          "description": null,
          "service_options": {},
          "is_operating": false
        }
      
Processed in 6.28s
        {
          "title": "Dental Resource Management",
          "type": "Dentist",
          "phone": "(512) 462-2959",
          "address": "Austin, TX",
          "years_in_business": null,
          "hours": "Opens 6 AM",
          "rating": null,
          "reviews": "No reviews",
          "price": null,
          "description": null,
          "service_options": {
            "in store shopping": null,
            "delivery": null
          },
          "is_operating": "Closed"
        }
      
Processed in 0.91s

Mixtral almost nailed it, the only thing missing is the Closed in the hours. However, the inference time is only 0.91 seconds, impressive.


Query: Dentist (2nd Result)
GPT-4Mixtral 8x7B
        {
          "title": "Dr. Brendan Loehr",
          "type": "Dentist",
          "phone": "(512) 713-1099",
          "address": "201 W 5th St",
          "years_in_business": null,
          "hours": null,
          "rating": null,
          "reviews": "0",
          "price": null,
          "description": null,
          "service_options": {},
          "is_operating": true
        }
      
Processed in 5.87s
        {
          "title": "Dr. Brendan Loehr",
          "type": "Dentist",
          "phone": "(512) 713-1099",
          "address": "201 W 5th St",
          "years_in_business": null,
          "hours": null,
          "rating": null,
          "reviews": "No reviews",
          "price": null,
          "description": null,
          "service_options": {
            "in store shopping": false,
            "delivery": false
          },
          "is_operating": true
        }
      
Processed in 0.86s

Mixtral scores perfectly and completes the task in only 0.86 seconds.


Query: Dentist (3rd Result)
GPT-4Mixtral 8x7B
        {
          "title": "Tejas Patel, DDS",
          "type": "Cosmetic dentist",
          "phone": "(512) 476-2336",
          "address": "221 W 6th St #940, Austin, TX 78701",
          "years_in_business": null,
          "hours": "Closed ⋅ Opens 8 AM Mon",
          "rating": 4.9,
          "reviews": 275,
          "price": null,
          "description": "Best cosmetic Dentist in Austin.",
          "service_options": null,
          "is_operating": false
        }
      
Processed in 9.82s
        {
          "title": "Tejas Patel, DDS",
          "type": "Cosmetic dentist",
          "phone": "(512) 476-2336",
          "address": "221 W 6th St #940, Austin, TX 78701",
          "years_in_business": null,
          "hours": "Closed",
          "rating": 4.9,
          "reviews": 275,
          "price": null,
          "description": "Best cosmetic Dentist in Austin.",
          "service_options": {
            "in store shopping": null,
            "delivery": null
          },
          "is_operating": false
        }
      
Processed in 1.06s

Mixtral made a mistake in the hours, the opening time is missing this time.


Query: Mexican Restaurant

Query: Mexican Restaurant (1st Result)
GPT-4Mixtral 8x7B
        {
          "title": "Tacos provincia",
          "type": "Mexican",
          "phone": "Not Available",
          "address": "658 Red River St",
          "years_in_business": "Not Available",
          "hours": "Not Available",
          "rating": "5.0",
          "reviews": "12",
          "price": "$1–10",
          "description": "Not Available",
          "service_options": {
            "Dine-in": true,
            "Curbside pickup": true
          },
          "is_operating": "Not Available"
        }
      
Processed in 8.75s
        {
          "title": "Tacos provincia",
          "type": "Mexican",
          "phone": "N/A",
          "address": "658 Red River St",
          "years_in_business": "N/A",
          "hours": "N/A",
          "rating": 5.0,
          "reviews": 12,
          "price": "$1–10",
          "description": "N/A",
          "service_options": {
            "Dine-in": true,
            "Curbside pickup": true
          },
          "is_operating": "N/A"
        }
      
Processed in 1.55s

👍


Query: Mexican Restaurant (2nd Result)
GPT-4Mixtral 8x7B
        {
          "title": "Velvet Taco",
          "type": "Mexican",
          "phone": "N/A",
          "address": "522 Congress Ave. Suite 100",
          "years_in_business": "N/A",
          "hours": "Closed ⋅ Opens 11 AM",
          "rating": 4.6,
          "reviews": 1500,
          "price": "$10–20",
          "description": "N/A",
          "service_options": {
            "Dine-in": true,
            "Takeout": true,
            "No-contact delivery": true
          },
          "is_operating": false
        }
      
Processed in 10.70s
        {
          "title": "Velvet Taco",
          "type": "Mexican",
          "phone": null,
          "address": "522 Congress Ave. Suite 100",
          "years_in_business": null,
          "hours": "Opens 11 AM",
          "rating": 4.6,
          "reviews": 1500,
          "price": "$10–20",
          "description": null,
          "service_options": {
            "Dine-in": true,
            "Takeout": true,
            "No-contact delivery": true
          },
          "is_operating": false
        }
      
Processed in 1.02s

The only difference is in the hours where Closed is left out. However, it does interpret the is_operating correctly.


Query: Mexican Restaurant (3rd Result)
GPT-4Mixtral 8x7B
        {
          "title": "Iron Cactus Mexican Restaurant and Margarita Bar",
          "type": "Mexican",
          "phone": "Not Available",
          "address": "606 Trinity St",
          "years_in_business": "Not Available",
          "hours": "Closed ⋅ Opens 11 AM",
          "rating": "4.1",
          "reviews": "3.7K",
          "price": "$20–30",
          "description": "Buzzy spot with happy hour & tequilas",
          "service_options": {
            "Not Available": true
          },
          "is_operating": false
        }
      
Processed in 8.36s
        {
          "title": "Iron Cactus Mexican Restaurant and Margarita Bar",
          "type": "Mexican Restaurant",
          "phone": "Not Found",
          "address": "606 Trinity St",
          "years_in_business": "Not Found",
          "hours": "Closed - Opens 11 AM",
          "rating": 4.1,
          "reviews": 3.7,
          "price": "$20–30",
          "description": "Buzzy spot with happy hour & tequilas",
          "service_options": {
            "in_store_shopping": false,
            "delivery": false,
            "curbside_pickup": false
          },
          "is_operating": false
        }
      
Processed in 1.22s

3.7 should be 3.7k or 3700 in Mixtral, a critical mistake.


Query: Yoga Studio

Query: Yoga Studio (1st Result)
GPT-4Mixtral 8x7B
        {
          "title": "YTX Yoga, Strength, & HIIT Training",
          "type": "Yoga studio",
          "phone": "(512) 502-5183",
          "address": "206 E 4th St, Austin, TX 78701",
          "years_in_business": "10+ years",
          "hours": "Closed ⋅ Opens 6:30 AM",
          "rating": "4.7",
          "reviews": "297",
          "price": null,
          "description": null,
          "service_options": {
            "Onsite services": true,
            "Online classes": true
          },
          "is_operating": false
        }
      
Processed in 13.72s
        {
          "title": "YTX Yoga, Strength, & HIIT Training",
          "type": "Yoga studio",
          "phone": "(512) 502-5183",
          "address": "206 E 4th St",
          "years_in_business": "10+",
          "hours": "Closed ⋅ Opens 6:30 AM",
          "rating": "4.7",
          "reviews": "297",
          "description": null,
          "service_options": {
            "Onsite services": true,
            "Online classes": true
          },
          "is_operating": false
        }
      
Processed in 1.71s

👍


Query: Yoga Studio (2nd Result)
GPT-4Mixtral 8x7B
        {
          "title": "Practice Yoga Austin",
          "type": "Yoga studio",
          "phone": "(512) 730-1638",
          "address": "1103 E 6th St, Austin, TX",
          "years_in_business": "10+ years",
          "hours": "Closed ⋅ Opens 7 AM",
          "rating": "5.0",
          "reviews": "97",
          "price": null,
          "description": "\"Practice yoga studio is very unique.\"",
          "service_options": {},
          "is_operating": false
        }
      
Processed in 10.52s
        {
          "title": "Practice Yoga Austin",
          "type": "Yoga studio",
          "phone": "(512) 730-1638",
          "address": "1103 E 6th St",
          "years_in_business": "10+",
          "hours": "Closed - Opens 7 AM",
          "rating": "5.0",
          "reviews": "97",
          "description": "Practice yoga studio is very unique.",
          "service_options": {
            "in_store_shopping": false,
            "delivery": false,
            "curbside_pickup": false
          },
          "is_operating": false
        }
      
Processed in 1.34s

👍


Query: Yoga Studio (3rd Result)
GPT-4Mixtral 8x7B
        {
          "title": "Sanctuary Studio & Loft Rentals",
          "type": "Yoga studio",
          "phone": "(512) 337-2213",
          "address": "1006 S 8th St, Austin, TX 78704",
          "years_in_business": "10+ years",
          "hours": "Closed ⋅ Opens 9 AM",
          "rating": "4.8",
          "reviews": "55",
          "description": "Easygoing non-profit yoga center",
          "service_options": {},
          "is_operating": false
        }
      
Processed in 9.40s
        {
          "title": "Sanctuary Studio & Loft Rentals",
          "type": "Yoga studio",
          "phone": "(512) 337-2213",
          "address": "1006 S 8th St",
          "years_in_business": "10+ years",
          "hours": "Closed ⋅ Opens 9 AM",
          "rating": "4.8",
          "reviews": "55",
          "description": "Easygoing non-profit yoga center",
          "service_options": {
            "in_store_shopping": false,
            "delivery": false,
            "curbside_pickup": false
          },
          "is_operating": false
        }
      
Processed in 1.74s

👍

Thoughts

GPT-4 scored a perfect score in parsing the HTML, however, the inference time isn't ideal. On the other hand, Mixtral 8x7b runs on Groq does perform much faster; for some of the results it even goes under 1 second. I think it could be considered for use in production, though it has made some mistakes in the results, this at least can be resolved by enhancing the prompt.

I am looking forward to experimenting more with Open Source LLMs. Follow us to get our up-to-date sharing.


Join us on X | YouTube

Add a Feature Request💫 or a Bug🐞