Tuesday, August 26, 2014

#Health Articles saved on Delicious by @ekivemark

It’s Tuesday, August 26, 2014 at 09:01AM
and time to bring you some Delicious #Health posts




via WordPress http://2.healthca.mp/1zy3iJO

Tuesday, August 19, 2014

#Health Articles saved on Delicious by @ekivemark

It’s Tuesday, August 19, 2014 at 09:01AM
and time to bring you some Delicious #Health posts




via WordPress http://2.healthca.mp/1mhacNp

Monday, August 18, 2014

Heading to /NYC to meet with @Medyears and planning HealthCa.mp/Boston w/ @healthblawg

After a successful weekend workings on ingesting NPI data in to Sqrrl/Accumulo today is a series of meetings looking at workflow and UI/UX.

It has already been a productive morning sitting on Megabus on Wifi. I have even had some time to think about preparations for a HealthCa.mp/Boston in early November. Mark your calendars for November 3rd in Cambridge, MA.

I am looking forward to working with friends in Boston led by David Harlow – Healthblawg.

[category News, Health, healthca.mp]

[tag health cloud, HealthCamp ]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://2.healthca.mp/1b61Q7M
email: mark
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.



via WordPress http://2.healthca.mp/1o5puVc

Friday, August 15, 2014

Swimming in Healthy #BigData Lakes….

Ingesting CSV files.

I have recently been on a mission to import the NPI data file that is generated each month by the Center for Medicare and Medicaid Services ( part of the Department of Health and Human Services). The NPPES monthly release is a zip file that contains all the NPI records for doctors and facilities in the USA.

The zip file is about 455MB in size. The main NPI file within that zip file is a 330 column, 4.3 million line file that is over 5.2GB in size. This is so large it will choke Excel.

If you want to take a look at the file check out this link: http://2.healthca.mp/1BkNduM

If you are dealing with size of file you are going to want to get to grips with a bunch of unix commands such as:

  • head
  • tail
  • sed
  • split

You may even need to polish your bash scripting. That is what I had to resort to.

Another useful toolset is CSVKIT. This is a library that provides a bunch of utilities for manipulating .csv files. Check out CSVKIT here: http://2.healthca.mp/1nX3ImD

Big data platform

My big data platform for this exercise was Sqrrl. I had already had fun importing this monster file in to MongoDB.

The first challenge was to build the single server environment. This involved configuring:

  • Hadoop
  • Zookeeper
  • Accumulo
  • Sqrrl

This is a non-trivial task. But with those services running and communicating with each other the next step was to master the bulk upload of the .csv file using the Sqrrl shell.

In theory this is a simple task. But that assumes that the data is clean. That can be a big assumption.

The first step in the upload is to create a Field Description File (FDF). This is easy to do. You can create this in a text file. The start of the NPI Field Description File looks like this:

0:NPI:INTEGER
1:Entity_Type_Code:INTEGER
2:Replacement_NPI:INTEGER
3:Employer_Identification_Number_(EIN):STRING
4:Provider_Organization_Name_(Legal_Business_Name):STRING
5:Provider_Last_Name_(Legal_Name):STRING
6:Provider_First_Name:STRING

Starting from the first column (column zero) you include the field name that you want in the sqrrl dataset, and the field type (Integer, String, DateTime)

I imported the NPI Header file that comes in the monthly zip file and manipulated it with some transformations in Excel and then exported the end result to a text file.

With the FDF file created for the 330 columns of data the next step was to deal with the main NPI .csv file itself.

The first step was to check the cleanliness of the file. I used CSVKits csvclean and csvstat. CSVCLEAN didn’t report any errors so things were looking good.

The version of Sqrrl I was using didn’t appear to have a setting to ignore the header line in a CSV file when using the FDF file. So I used a sed command to remove the first line:

sed ’1d’ {input.file} >{new_output_file}

With the header line removed I now had 5.2GB and 4.3M+ lines of CSV data.

Before the file can be worked on by sqrrl it needs to be uploaded to Hadoop’s HDFS file system. to do that you use Haddop’s utility.

hadoop fs -put {source_file} {folder_in_HDFS}

Now we are in a position to use the Sqrrl shell and process the file.

first use the sqrrl shell to create the target dataset, if it doesn’t exist.

sqrrl shell -u {Sqrrl user} -p {sqrrl user password} -s {sqrrl_host:port} -e “startload –csv {folder_and_CSV_in_HDFS} –field-descriptions-file {FDF_file_in_local_file_system} –uuid-fields {Columns_to_use_for_UUID} –uuid-delimiter _ –do-upates -w -d {target_dataset}”

The theory goes that the data now is ingested from Hadoop in to sqrrl and deposited in to accumulo. But now the adventure really started.

When the ingestion fails…

The batch upload process would run for about 40,000 lines of the 4.3 million lines and fail. The MapReduce logs give very little information. The indication is that the data has some corrupt information in it. This seemed strange since CSVKit gave it a clean bill of health.

I therefore had to resort to a process of elimination to find the root cause. The challenge is that scanning 330 columns across even just a few lines of data is challenging.

The first step was to use the head command to split off a few lines at the top of the file. So I grabbed 100 lines and exported them to a new file.

I then uploaded the file to hadoop and ran the startload command. It worked. This indicated that the system was runnning and the FDF file format was okay. Going in to the sqrrl shell I was able to use select statements to pull back information from the uploaded data.

I then retried the startload command but still no dice. There is a problem with the data. When the startload command fails it tells you how many lines it successfully processed before failing. For some reason this number varied each time it ran. However the failure occurred at a point about 40,000 lines.

So my next step was to use the “wc -l” command to determine how many lines were in the file. The answer: 4336701. Breaking that file up in to chunks that were smaller than the number of lines were failure occured would mean creating a couple of hundred files. It was time to resort to some bash scripting.

I therefore basically wrote a script that took the starting CSV file, split it in to 30,000 line chunks and deposited those chunks in a temporary folder.
I then used “ls” to get the list of files and then iterated over each of them to:

  • hadoop fs -put {upload_file} {sqrrl_processing_folder}
  • sqrrl shell to run startload to process the upload file using the local FDF file.

This took a number of hours to run.

I then grabbed the jobnumbers for each sqrrl startload command and sued sed to create a shell script that would run a checkload on each job.

When I scanned the files there were two of the upload files that failed. At least I had found the culprits but each file was still 30,000 lines in size.

I basically then ran a process to split each problem file in to a series of sub files and submit them through the same process of uploading to hadoop and sqrrl.

Each cycle there would be one file upload that would fail. The problem was obviously in the data but which line? When I eventually split the lines down until there was only 1 line in each file I was able to isolate the problem.

So what was the problem?

To unix experts this will probably be a “Dohh!!” moment.

When I examined the problem lines there was one field where there was a “\” as either the only character in a field, or it was the LAST character in the field. The effect of this was to escape the quotes character that closed the field. This is what was causing the import to choke.

I could easily go in and modify the line by hand and re-submit but I wanted to work out how to do it via a command. So I basically came up with a sed command that would substitute any occurrence of ‘\”‘ in the file and replace with ‘ “‘. This worked using sed ‘s/\”/ “/’.

Problem solved! What an adventure!

[category News, Health, bigdata]
[tag health cloud Genomics, sqrrl, accumulo, hadoop, bigdata, nosql]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://2.healthca.mp/1b61Q7M
email: mark@ekivemark.com
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.



via WordPress http://2.healthca.mp/1qdegPX

Tuesday, August 12, 2014

#Health Articles saved on Delicious by @ekivemark

It’s Tuesday, August 12, 2014 at 09:03AM
and time to bring you some Delicious #Health posts




via WordPress http://2.healthca.mp/1nLsy8Y

Thursday, August 07, 2014

#health2stat Captain Dan -Army fit patient engagement

Social engagement is critical to behavior change

I am here at the Barking Dog in Bethesda for #health2stat. More info here:
http://2.healthca.mp/1pf0HDe

total health = physical health + behavior change

Army has built a web platform that uses social on-boarding to meet soldiers where they are.
Platform is built to provide personalized social engagement.

Weekly feedback on 5 areas: social, family, spiritual, physical …

Real age score
Competitive positions warriors against others that they match. Eg by age, specialty, location.

Everything is built from a social perspective.

Being military the data is rolled up and de-identified to present to commanders. This helsp to drive culture change.

[tag health, cloud]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://2.healthca.mp/1b61Q7M
email: mark
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health. T



via WordPress http://2.healthca.mp/1sClSz7

#health2stat NIH genetics medicine opportunity

I am here at the Barking Dog in Bethesda for #health2stat. More info here:
http://2.healthca.mp/1pf0HDe

Wendy Rubenstein – Genetic medicine opportunity

People that act on genomic information have a 50% better survival chance.
genetic Test Registry -GTR

Physicians are concerned on cost of testing, concern on validity of tests, can patients understand test results.

National center for biotechnology information.

GTR / Clinvar / MedGen.

Note: MedYear is looking to sync to these data registries. Watch out for details on this.

Registry is approaching 20,000 test recorded. Growth from 2,000 in 2 years.

GTR is an integrator of test information and aligns with pubmed, snowed etc.

80% of physicians use smartphones and tablets.
The challenge is to make all the genetics related information accessible to these handheld devices.

[tag health, cloud ,Genomics]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://2.healthca.mp/1b61Q7M
email: mark
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.



via WordPress http://2.healthca.mp/X5ltL2

#health2stat Expertscape find experts for any medical problem globally

I am here at the Barking Dog in Bethesda for #health2stat. More info here:
http://2.healthca.mp/1pf0HDe

Expertscape- find an expert for any medical problem – Brendan McAdams

A web site that allows drill down by medical problem and via region. Find an expert near you.

  • objective
  • highly specific
  • current
  • global

Find an expert in the specific condition you have. It goes deeper than a hospital center of expertise.

Uses the pubmed database.

It is not for routine medicine.
Based on knowledge and not outcomes
If you are not published you are unknown to the system.

Published experts tend to indicate centers of excellence. Experts often teach and attract others with similar areas of focus.

[tag health cloud Genomics, iot]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://2.healthca.mp/1b61Q7M
email: mark
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.



via WordPress http://2.healthca.mp/1u3ghVB

#health2stat data into action. Consumers and clinicians want insight and action.

I am here at the Barking Dog in Bethesda for #health2stat. More info here:
http://2.healthca.mp/1pf0HDe

Data in to Action – Juan Pablo Segura

From startup 1EQ

Based in Georgetown
Leukemia survival rate has risen from 14% in 1960′s to 60% today

MHeath is matching social media adoption. The internet of things(iot) is also coming in to play.

75 MHealth trials only 3 showed consistent success in managing disease.

Sustained engagement is a challenge.
Consumers want insight and action. So do clinicians.

1EQ is focused on pre-natal care.
The patient gets an app and a box with wireless scale and blood pressure meter.

Give the consumer simple tasks.
95% noise elimination is key. Exception reporting and insight is essential.
Insight not just data.
Show relevance and exception reporting.

In pre-natal situation efficiency comes from reduced patient visits.

[tag health cloud Genomics, iot]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://2.healthca.mp/1b61Q7M
email: mark
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.



via WordPress http://2.healthca.mp/1u3ghVj

#health2stat at the Barking Dog in Bethesda

I am here at the Barking Dog in Bethesda for #health2stat. More info here:
http://2.healthca.mp/1pf0HDe

As always we have a great line up of speakers and intriguing topics in a series of 5 minute lightning talks.

  • Health STAT Speakers August 7, 2014

    Palladian Partners is pleased to announce the list of speakers for the upcoming Health 2.0 STAT event on Thursday, August 7, 2014, at The Barking Dog in Bethesda. Registration is now open! To register, visit the Health 2.0 STAT meetup site.

    Plan to join us this Thursday, August 7th, to hear a rapid-fire series of short presentations from five Health 2.0 leaders. The presentations will be followed by a panel style Q&A session, led by a moderator, who will facilitate and explore a range of topics. This evening’s event will feature talks about health informatics and patient engagement, leveraging technology to engage patients in their health.

    1. Juan Pablo Segura, 1eq
    Juan Pablo Segura is obsessed with how the Internet of Things can improve the way we interact with our healthcare. His company, 1eq, is focused on reimagining how technology can be used to manage prenatal care through its product – Babysteps. He has also been named a Healthcare Transformer by the Startup Health Academy in New York City and he writes frequently on different technology topics in the space through his company’s blog.

    2. Brendan McAdams, Expertscape
    While tremendous effort has been invested in making data accessible to the various constituents in the healthcare system, it remains a difficult undertaking to find a highly qualified medical expert or institution when a patient has been diagnosed with a serious condition, and those resources that do identify and rank medical expertise are based upon flawed and/or biased methodologies. Expertscape (http://2.healthca.mp/1pf0KyF;­) is the first comprehensive, objective tool to help patients and others identify and evaluate medical expertise by the very specific disease and geography that is important to them. The system uses its patented, big data methodology and algorithms to analyze and quantify the NIH’s PubMed database, enabling the healthcare consumer to research and select medical expertise and institutions for a second opinion or treatment.
    Brendan McAdams serves as the Managing Director of Expertscape, and is based in Baltimore. He is a 30-year marketing and sales executive focused on consumer-directed solutions for health plans, health systems and Accountable Care Organizations.

    3. Wendy Rubinstein, NLM/NCBI
    Medical professionals need accurate genomic medicine information at the point of care.
    NCBI has the data.
    You make the app.

    4. Daniel Johnston, MD, MPH, LTC, US Army
    In 2012, the Department of Defense charged Daniel T. Johnston, MD, MPH, with creating an online wellness platform that would enable the agency to assess, manage and improve the health of active duty Soldiers and Army civilians. Johnston led a project with Sharecare to build ArmyFit: a platform maximizing digital engagement and fostering behavioral change through tailored content and interactive tools. During this presentation, attendees will learn how the Army is utilizing data-driven applications to monitor and improve population health and program effectiveness at the installation level, as well as optimize development of health policy and resource allocation.

    Agenda
    6:00pm – 6:30pm Registration and Networking
    6:30pm – 7:15pm Presentation
    7:15pm – 7:30pm Q&A
    7:30pm – Additional Networking

    As always, we want to make sure to give thanks to our sponsors,Palladian Partners, Altarum, Aquilent, and WebMD. Because of their support, we have been able to continue creating some really informative events, as well as, offer some great appetizers and beverages for all that attend the Meetup.

    You can follow us on twitter @DCHealth2_0

[tag health]

Mark Scrimshire
Health & Cloud Technology Consultant

Mark is available for challenging assignments at the intersection of Health and Technology using Big Data, Mobile and Cloud Technologies. If you need help to move, or create, your health applications in the cloud let’s talk.
Blog: http://2.healthca.mp/1b61Q7M
email: mark
Stay up-to-date: Twitter @ekivemark
Disclosure: I began as a Patient Engagement Advisor and am now CTO to Personiform, Inc. and their Medyear.com platform. Medyear is a powerful free tool that helps you collect, organize and securely share health information, however you want. Manage your own health records today. Medyear: The Power Grid for your Health.



via WordPress http://2.healthca.mp/1ussYqg